[Bro-Dev] Broker has landed in master, please test

Azoff, Justin S jazoff at illinois.edu
Fri Jun 1 08:37:17 PDT 2018

On May 22, 2018, at 10:26 AM, Robin Sommer <robin at icir.org<mailto:robin at icir.org>> wrote:

With such a large change, I'm sure there'll be some more kinks to iron
out still; that's where we need everybody's help. If you have an
environment where you can test drive new Bro versions, please give
this a try. We're interested in any feedback you have, both specific
issues you encounter (best to file tickets) and general experiences
with the new version, including in particular any observations about
performance (best to send to this list).

Have this running on a few clusters now, so far it's been really good.  This graph shows how stable it has been on one of of clusters.

[cid:57EB6001-3898-4F59-A4D9-BF080307A62F at home]

On that cluster manager node we were seeing a random cpu+traffic+memory spike on one of the proxies that would eventually be killed by the OOM killer.. then it would restart and get killed again shortly after that.  A larger cluster would see the same spikes but it had 4x the ram and wouldn't OOM.

Since switching to the broker version around 5/25, that completely stopped.  The base CPU usage is a bit higher, but all the random spikes are gone.  The base memory usage is also lower.

I could never figure out what was causing the problem, and it's possible that &synchronized not doing anything anymore is why it's better now.  I'm mostly using &synchronized for syncing input files across all the workers and one of them does have 300k entries in it.  That file is fairly constant though, only a few k changes every 5 minutes and nothing that should use 20G of ram.

I still need to replace all of our uses of &synchronized. The config framework may work for most cases once the cluster bits are done, but probably not for syncing the 300k item set.

Another GREAT thing I noticed that we may want to add to NEWS is that it looks like the 'file descriptors used per worker' is down from 5 (1 socket + 2+2 from pipes) to just 1 socket (no more flares?).  This means that even though select() is still not gone, the limitation of ~175 workers for 2.5.x will go away and people would be able to run 500+ worker clusters if they wanted to.

Justin Azoff

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180601/2937afef/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screen Shot 2018-06-01 at 10.36.25 AM.png
Type: image/png
Size: 216983 bytes
Desc: Screen Shot 2018-06-01 at 10.36.25 AM.png
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180601/2937afef/attachment-0001.bin 

More information about the bro-dev mailing list