[Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)

Azoff, Justin S jazoff at illinois.edu
Fri Nov 3 10:07:09 PDT 2017


> On Nov 3, 2017, at 6:51 AM, Jan Grashöfer <jan.grashoefer at gmail.com> wrote:
> 
> At this point, if the manager functionality is distributed across 
> multiple data nodes, we have to make sure, that every data node has the 
> right part of the DataStore to deal with the incoming hit. One could 
> keep the complete DataStore on every data node but I think that would 
> lead to another scheme in which a subset of workers send all their 
> requests to a specific data node, i.e. each data node serves a part of 
> the cluster.

Yeah, this is where the HRW(hashing) vs RR(round robin) pool distribution methods come in.

If all data nodes had a full copy of the data store, then either dsitribution method would work.

Partitioning the intel data set is a little tricky since it supports subnets and hashing 10.10.0.0/16
and 10.10.10.10 won't necessarily give you the same node.  Maybe subnets need to exist on all
nodes but everything else can be partitioned?    There would also need to be a method for
re-distributing the data if the cluster configuration changes due to nodes being added or removed.

'Each data node serving a part of a cluster' is kind of like what we have now with proxies,
but that is statically configured and has no support for failover.  I've seen cluster setups where
there are 4 worker boxes and run one proxy on each box.  The problem is if one box down,
1/4 of the workers on the remaining 3 boxes are configured to use a proxy that no longer exists.

So minimally just having a copy of the data in another process and using RR would be an improvement.

There may be an issue with scaling out data notes to 8+ processes for things like scan detection and sumstats,
if those 8 data nodes would also need to have a full copy of the intel data in memory. I don't know how much
memory a large intel data set is inside a running bro process though.

Things like scan detection,sumstats,known hosts/ports/services/certs are a lot easier to partition because by definition
they are keyed on something.

— 
Justin Azoff




More information about the bro-dev mailing list