[Bro-Dev] [desired broker api as oppose to whats in known-hosts.bro]

Azoff, Justin S jazoff at illinois.edu
Fri Mar 3 16:21:16 PST 2017


> On Mar 3, 2017, at 4:36 PM, Aashish Sharma <asharma at lbl.gov> wrote:
> 
> SO I came across a sample of Broker-API usage:

Yeah.. there's a lot of things wrong with how that is being done.  There are a few things going on here.

One is that &synchronized is no longer functions.  I think we should bring this back, it may not be in the form of &synchronized, but at least some way to create a simple data structure that is automatically kept in sync between nodes.

The other is that the api that known hosts is currently using is too high level:

Broker::exists(Cluster::cluster_store, Broker::data("known_hosts"))
Broker::lookup(Cluster::cluster_store, Broker::data("known_hosts"))
Broker::set_contains(res2$result, Broker::data(host))
Broker::add_to_set(Cluster::cluster_store, Broker::data("known_hosts"), Broker::data(host));

which in english is:

1. see if the known_hosts table exists (why would it not exist?)
2. transfer the entire known_hosts table over from the data node
3. see if it contains host
4. add host if not present

And (due to probably an oversight), it does this twice resulting in the known_hosts table being transferred twice.

This would work a lot better if it keep a persistent copy of the known_hosts set between calls and only updated it from the data node if the host wasn't found.  The only downside there is that the entire table is still being copied between nodes instead of just updates.


To accomplish what known hosts really needs, which is just "Have I seen this host before", we could just do something like:

local added = Broker::add_to_set(Cluster::cluster_store, Broker::data("known_hosts"), Broker::data(host));
if(added) {
    # host did not previously exist in the set



The only problem in this case is there is no local cache to prevent the same host from being checked multiple times.  That would require a local copy of the set, or like you said, a bloomfilter of sorts (probably one of those reverse bloomfilters that has false negatives but not false positives).

So, for the case of tracking things using a set across the cluster all one needs is a simple function that:

Checks to see if the item is in the local cache or bloom filter
Sends it over to the data node and inspects the response (new or duplicate)

Things get a little more complicated in that I want the ability to scale out the data nodes.  So that means the slight variation:

Checks to see if the item is in the local cache or bloom filter
Sends it over to the data node that corresponds to the hash of the item and inspects the response (new or duplicate)

so from a users point of view, the Broker part of the function could just be

if(Broker::check_or_add_to_set("known_hosts", host)) {
    Log::write(Known::HOSTS_LOG, [$ts=network_time(), $host=host]);
}

Another way of writing this, which corresponds to your 'event based' approach is to just have the function instead do:

Check to see if the item is in the local cache or bloom filter
Send an event over to the data node that says a new host was potentially found.

For known hosts purposes, the data node doesn't even need to send anything back to the worker, it can just log it(or not).

It would help me think about this if you could outline some of your use cases for broker stores.  I have a good idea of what needs to be done to fix known hosts/services/certs and sumstats/scan detection.  But I don't know what things you have in mind :-)





-- 
- Justin Azoff




More information about the bro-dev mailing list