[Bro-Dev] Scaling out bro cluster communication
Azoff, Justin S
jazoff at illinois.edu
Thu Feb 9 12:21:09 PST 2017
I've been thinking about ideas for how to better scale out bro cluster communication and how that would look in scripts.
Scaling out sumstats and known hosts/services/certs detection will require script language or bif changes.
What I want to make possible is client side load balancing and failover for worker -> manager/datanode communication.
I have 2 ideas for how things could work.
## The implicit form, new bifs like:
send_event(dest: string, event: any);
send_event_hashed(dest: string, hash_key: any, event: any);
send_event("datanode", Scan::scan_attempt(scanner, attempt));
send_event_hashed("datanode", scanner, Scan::scan_attempt(scanner, attempt));
## A super magic awesome implicit form
global scan_attempt: event(scanner: addr, attempt: Attempt)
&partition_via=func(scanner: addr, attempt: Attempt) { return scanner; } ;
The implicit form fits better with how bro currently works, but I think the explicit form would ultimately make cluster aware scripts simpler.
The difference hinges on the difference between the implicit and explicit communication.
Currently all bro cluster communication is implicit:
* You send logs to the logger/manager node by calling Log::write
* You send notices to the manager by calling NOTICE
* You can share data between nodes by marking a container as &synchronized.
* You can send data to the manager by redef'ing Cluster::worker2manager_events
The last two are what we need to replace/extend.
As an example, in my scan.bro I want to send scan attempts up to the manager for correlation, so this means:
# define event
global scan_attempt: event(scanner: addr, attempt: Attempt);
# route it to the manager
redef Cluster::worker2manager_events += /Scan::scan_attempt/;
# only handle it on the manager
@if ( Cluster::local_node_type() == Cluster::MANAGER )
event Scan::scan_attempt(scanner: addr, attempt: Attempt)
{
add_scan_attempt(scanner, attempt);
}
@endif
and then later in the worker code, finally
# raise the event to send it down to the manager.
event Scan::scan_attempt(scanner, attempt);
If bro communication was more explicit, the script would just be
# define event and handle on all nodes
global scan_attempt: event(scanner: addr, attempt: Attempt);
event Scan::scan_attempt(scanner: addr, attempt: Attempt)
{
add_scan_attempt(scanner, attempt);
}
# send the event directly to the manager node
send_event("manager", Scan::scan_attempt(scanner, attempt));
Things like scan detection and known hosts/services tracking are easily partitioned, so if you had two datanodes for analysis:
if (hash(scanner) % 2 == 0)
send_event("datanode-0", Scan::scan_attempt(scanner, attempt));
else
send_event("datanode-1", Scan::scan_attempt(scanner, attempt));
Which would be wrapped in a function:
send_event_hashed("datanode", scanner, Scan::scan_attempt(scanner, attempt));
that would handle knowing how many active nodes there are and doing proper consistent hashing/failover, something like this:
function send_event_hashed(dest: string, hash_key: any, event: any) {
data_nodes = |Cluster::active_nodes[dest]|; # or whatever
node = hash(hash_key) % data_nodes;
node_name = Cluster::active_nodes[node]$name;
send_event(node_name, event);
}
--
- Justin Azoff
More information about the bro-dev
mailing list