[Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)
Azoff, Justin S
jazoff at illinois.edu
Fri Nov 3 10:07:09 PDT 2017
> On Nov 3, 2017, at 6:51 AM, Jan Grashöfer <jan.grashoefer at gmail.com> wrote:
> At this point, if the manager functionality is distributed across
> multiple data nodes, we have to make sure, that every data node has the
> right part of the DataStore to deal with the incoming hit. One could
> keep the complete DataStore on every data node but I think that would
> lead to another scheme in which a subset of workers send all their
> requests to a specific data node, i.e. each data node serves a part of
> the cluster.
Yeah, this is where the HRW(hashing) vs RR(round robin) pool distribution methods come in.
If all data nodes had a full copy of the data store, then either dsitribution method would work.
Partitioning the intel data set is a little tricky since it supports subnets and hashing 10.10.0.0/16
and 10.10.10.10 won't necessarily give you the same node. Maybe subnets need to exist on all
nodes but everything else can be partitioned? There would also need to be a method for
re-distributing the data if the cluster configuration changes due to nodes being added or removed.
'Each data node serving a part of a cluster' is kind of like what we have now with proxies,
but that is statically configured and has no support for failover. I've seen cluster setups where
there are 4 worker boxes and run one proxy on each box. The problem is if one box down,
1/4 of the workers on the remaining 3 boxes are configured to use a proxy that no longer exists.
So minimally just having a copy of the data in another process and using RR would be an improvement.
There may be an issue with scaling out data notes to 8+ processes for things like scan detection and sumstats,
if those 8 data nodes would also need to have a full copy of the intel data in memory. I don't know how much
memory a large intel data set is inside a running bro process though.
Things like scan detection,sumstats,known hosts/ports/services/certs are a lot easier to partition because by definition
they are keyed on something.
More information about the bro-dev