[Bro-Dev] Bloom filter merging

Mon Jun 17 22:54:30 PDT 2013

> seed_i = h(  ((name.length() == 0) ? initial_seed : name) || i)

This is how I actually implemented it internally currently. However, I
do not think it will make a difference because CompHash is also seeded
by initial_seed [1], and hashing the Bloom filter implementation uses
CompHash to hash Val instances. To generate k different hash values I
just stuff the CompHash output into k H3 instances that are seeded
according to the formula above. Let x be an instance of a Val. Then
the i'th hash value is:

    h_i(x) = H3_i(CompHash(x))

> In my mind, it'd be cool if identical names would produce identical results (in this case) without the need to set an environment variable.  Maybe there's a reason that's a bad idea, though?

Soumya pointed out the problem: there's a trade-off between making
Bloom filters shareable on the one hand, and robust against attackers
on the other hand. If we want to distribute Bloom filters at some
point, we'll face the problem of running into different seeds in
different deployments. Conceptually, we have two tuning knobs: the
Bloom filter name, which is unique to the Bloom filter, and the
environment variable BRO_SEED_FILE that determines initial_seed.
Ideally our uses will not have to mess with their seed to use a Bloom
filter someone wants to share. At this point, however, it is
impossible to get rid of the initial_seed dependency due to the
problem noted above. If there's a straight-forward way to hash an
arbitrary Val in a different way, please let me know.

    Matthias

[1] According to the documentation in Hash.cc, hashing a Val either
uses H3 for short data and HMAC MD5 for arbitrary long data. Both the
seed of the H3 instance as well as the HMAC key are a function of
initial_seed.