[Bro] Bro cluster requirements and manager logging backlog bug

Azoff, Justin S jazoff at illinois.edu
Tue Dec 20 12:40:37 PST 2016

> On Dec 20, 2016, at 1:56 PM, Hovsep Levi <hovsep.sanjay.levi at gmail.com> wrote:
> [bro at mgr /opt/bro]$ bin/broctl top manager logger
> Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
> logger       logger   52852   parent  109G   100G   0%  bro
> logger       logger   52867   child   837M   498M   0%  bro
> manager      manager   52935   child   485M    17M   0%  bro
> manager      manager   52892   parent    2G   557M   0%  bro
> In this condition all the workers are at 100% CPU and the worker nodes have all 128GB RAM used.  The manager node had to be rebooted as "killall -9 bro" had no effect.  This is what happens if Bro isn't restarted every 30 minutes.

This output with the cpu at 0 is kind of odd, unless it was already swapping or something.

> Also, you've never mentioned the actual rate of logs you are seeing at these peak times
> Running this in your log directory would help:
> du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l
> [bro at mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l
> 56      .
>   789695
> 220     .
>  2801719

So this shows only 33k logs/sec and 3MB/sec

> @ Tue Dec 20 18:46:48 UTC 2016 already the logger has 5G memory:
> [bro at mgr /opt/bro]$ bin/broctl top manager logger
> Name         Type    Host             Pid     Proc    VSize  Rss  Cpu   Cmd
> logger       logger   18832   parent    5G     5G 192%  bro
> logger       logger   18874   child     1G     1G  58%  bro
> manager      manager   18947   child   510M   255M  55%  bro
> manager      manager   18905   parent   11G     1G  25%  bro

This shows that your logger process seems to just have issues keeping up with the volume...

> [bro at mgr /opt/bro_data/logs/current]$ du -ms;cat *|wc -l;sleep 60;du -ms;cat *|wc -l
> 593     .
>  7117478
> 809     .
>  9573974

but based on this you are only doing 40k logs/sec and 4 MB/sec and shouldn't really be having issues.  We have users doing over 200k/sec.

Can you check the following:

after bro has been running for a bit:

    wc -l *.log | sort -n

to show which log files are the largest

the output of this command:

    top -b -n 1 -H -o TIME |grep bro:|head -n 20

or just run top and press H.  That should show all the bro logging threads (it works on linux at least)  They may show up truncated but it's enough to tell them apart.

What model/count CPU does your manager have?

Are you writing out logs as the default ascii or using json?

- Justin Azoff

More information about the Bro mailing list