[Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)
Azoff, Justin S
jazoff at illinois.edu
Fri Nov 3 13:05:10 PDT 2017
On Nov 3, 2017, at 3:13 PM, Jan Grashöfer <jan.grashoefer at gmail.com<mailto:jan.grashoefer at gmail.com>> wrote:
On 03/11/17 18:07, Azoff, Justin S wrote:> Partitioning the intel data set is a little tricky since it supports subnets and hashing 10.10.0.0/16
and 10.10.10.10 won't necessarily give you the same node. Maybe subnets need to exist on all
nodes but everything else can be partitioned?
Good point! Subnets are stored kind of separate to allow prefix matches anyway. However, I am a bit hesitant as it would become a quite complex setup.
Indeed.. replication+load balancing is probably a good enough first step.
There would also need to be a method for
re-distributing the data if the cluster configuration changes due to nodes being added or removed.
Right, that's exactly what I was thinking of. I guess this applies also to other use cases which will use HRW. I am just not sure whether dynamic layout changes are out of scope at the moment...
Other use cases are still problematic, but even without replication/redistribution the situation is still greatly improved.
Take scan detection for example:
With sumstats/scan-ng/simple-scan if the current manager host or process dies, all detection comes to a halt
until it is restarted. Once it is restarted, all state is lost so everything starts over from 0.
If there were 4 data nodes participating in scan detection, and all 4 die, same result, so this is no better or
worse than the current situation.
If only one node dies though, only 1/4 of the analysis is affected. The remaining analysis can immediately
fail over to the next node. So while it may still have to start from 0, there would only be a small hole in the analysis.
The scan threshold is 20 packets.
A scan has just started from 10.10.10.10.
10 packets into the scan, the data node that 10.10.10.10 hashes to crashes.
HRW now routes data for 10.10.10.10 to another node
30 packets into the scan, the threshold on the new node crosses 20 and a notice is raised.
Replication between data nodes could make this even more seamless, but it's not a huge priority, at least for me.
My priority is getting the cluster to a point where things don't grind to a halt just because one component is down.
Ignoring the worker->logger connections, it would look something like the attached layout.png
[cid:4B1B7729-7A8D-483C-83A8-04E1783FE0AE at home]
Fully agreed! In that case it might be nice if one can define separate special purpose data nodes, e.g. "intel data nodes". But, I am not sure whether this is a good idea as this might lead to complex cluster definitions and poor usability as users need to know a bit about how the underlying mechanisms work. On the other hand this would theoretically allow to completely decouple the intel data store (e.g. interface a "real" database with some pybroker-scripts).
I've been thinking the same thing, but I hope it doesn't come to that. Ideally people will be able
to scale their clusters by just increasing the number of data nodes without having to get into
the details about what node is doing what.
Partitioning the data analysis by task has been suggested.. i.e., one data node for scan detection,
one data node for spam detection, one data node for sumstats.. I think this would be very easy to
implement, but it doesn't do anything to help scale out those individual tasks once one process can
no longer handle the load. You would just end up with something like the scan detection and spam
data nodes at 20% cpu and the sumstats node CPU at 100%
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 19088 bytes
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171103/4b31b911/attachment-0001.bin
More information about the bro-dev