[Bro-Dev] Scaling out bro cluster communication

Fri Feb 10 08:53:16 PST 2017

> On Feb 10, 2017, at 9:30 AM, Seth Hall <seth at icir.org> wrote:
> 
> 
>> On Feb 9, 2017, at 3:21 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:
>>   # define event and handle on all nodes
>>   global scan_attempt: event(scanner: addr, attempt: Attempt);
>>   event Scan::scan_attempt(scanner: addr, attempt: Attempt)
>>       {
>>       add_scan_attempt(scanner, attempt);
>>       }
>> 
>>   # send the event directly to the manager node
>>   send_event("manager", Scan::scan_attempt(scanner, attempt));
> 
> I do like the look of making this more explicit.  The implicit event sharing behavior makes some stuff that feels like it should be easy end up being really difficult.  Do you have thoughts on how you'd do things like if you want the manager to send an event to all workers or all data nodes?

Hmm, perhaps there would be multiple functions:

* One for sending an event to all nodes of a type
* One for sending an event to a specific node
* One for sending an event to one type of node based on a hash function

Currently bro only does the first one (but by only having one manager or data node means that events sent to data nodes only go to one)

Not being able to send events directly to an individual node also prevents bro scripts from doing RPC type queries.  A worker can send the manager a query, but the manager can only raise a reply event that is sent to all workers.

> Another thing I think we need to address is that this behavior seamlessly falls back if someone isn't running a cluster.  Do you expect your idea to do that?  I know that in the current programming model, making this cluster aware but still work not on a cluster can be painful to create the right abstraction.
> 
>  .Seth

For falling back, if

    send_event("manager", Scan::scan_attempt(scanner, attempt));

was ran on the manager node it could skip broker and just raise the event locally.

Currently bro has cluster specific code in intel,netcontrol,notice,openflow,packet-filter,sumstats.. so the current event system doesn't always just magically work on a cluster.. I don't think explicit send_event functions would change that at all.

Plus, I'm not even sure if special-casing a non-cluster makes sense anymore.

For example, scan detection on a single node doesn't need to do any cluster communication, it can just manage everything locally.  But the code that handles scan detection is extremely simple: it consumes scan_attempt events and raises notices.  What if a dedicated actor thread was started to handle the scan_attempt event?  Then the code could do something like

    send_event("scan_aggregator", Scan::scan_attempt(scanner, attempt));

Which even on a single process instance could distribute the event to a thread dedicated to handling this work.

-- 
- Justin Azoff