[Bro-Dev] [JIRA] (BIT-1039) Merge request for Bloom filters

Robin Sommer (JIRA) jira at bro-tracker.atlassian.net
Thu Jul 25 12:44:04 PDT 2013


    [ https://bro-tracker.atlassian.net/browse/BIT-1039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13316#comment-13316 ] 

Robin Sommer commented on BIT-1039:
-----------------------------------

Merged, thanks.

However, we need to address the hash problem to support merging across Bro instances, I'm leaving the ticket open for that. Here's a proposal what to do (after talking to Bernhard):

1. Change {{CompositeHash}} to optionally use a custom {{H3}} instance.

2. Extend the {{Hasher}} to take both a name and an optional seed value (probably another string). Internally, it combines the two into the seed for the {{Hasher's}} internal H3, i.e., same name+seed means same hash functions. If the optional seed value is not given, take it from a global script level variable {{GLOBAL_INSTALLATION_SEED}} (or so :).

3. Change BloomFilterVal to pass to {{CompositeHash}} a custom {{H3}}. I believe this could be the same instance that {{Hasher}} is using internally, so that we get the same consistency guarantees. Indeed, hashing of {{Val}} should then probably move into {{Hasher}}.

3. Along the same lines as (2), extend the bloom filter interface to take both a name and the optional seed. Same name+seed means filters can be merged.

4. Change BroControl to redefine {{GLOBAL_INSTALLATION_SEED}} to a non-predictable value that will remain consistent across {{install}}.

I believe that with this we can support two use cases: (1) in a cluster, all bloom filters created with the same name but without any further seed value will be compatible (because they'll use {{GLOBAL_INSTALLATION_SEED}}); and (2) externally provided Bloom filters can specify their own seed so that any Bro installation can pull them in. 

Does this make sense?
                
> Merge request for Bloom filters
> -------------------------------
>
>                 Key: BIT-1039
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1039
>             Project: Bro Issue Tracker
>          Issue Type: New Feature
>          Components: Bro
>            Reporter: Matthias Vallentin
>            Priority: Medium
>             Fix For: 2.2
>
>
> The Bloom filter implementation in `topic/matthias/bloom-filter` is ready to merge into master. Have a look at the very end of `bro.bif` for the script-land interface.
> Internally, we have a new `BloomFilterVal`, which is serializable and mergeable and thus ready for cluster use. This `Val` contains a polymorphic Bloom filter instance, which hides the concrete Bloom filter type (currently only basic and counting). Moreover, this branch introduces the notion of ''hashers'', which are parameterizable (i.e., seedable) structures for hashing values ''k'' times. I recall that Bernhard waits for this feature. See `Hasher.h` for the documented interface.
> In the future, we need to rethink how to construct hash functions which only depend on a seed given at script land. This will be important when sharing Bloom filters across organizational boundaries. At this point, the implementation relies on `CompHash` (at least for composite values, such as records) which itself depends on the initial Bro seed generated at startup time or when the user specifies the environment variable `$BRO_SEED`.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://bro-tracker.atlassian.net/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


More information about the bro-dev mailing list