[Bro] Worker System Memory Exhaustion

Azoff, Justin S jazoff at illinois.edu
Fri Apr 6 07:23:40 PDT 2018


> On Apr 6, 2018, at 10:07 AM, Hovsep Levi <hovsep.sanjay.levi at gmail.com> wrote:
> 
> This was a battle we endured for many many moons (12+ months), look to the archives for the pain and suffering. 
> 
> Final solution :  Enable multiple loggers (now part of Bro), disable writing logs to disk and stream logs to Kafka.  (Thank you KafkaLogger author)
> 
> Reasoning  :  At some point Bro's log writing cannot keep up with the volume.  Believed to be a bottleneck with the the default architecture using a single "Logger" node.
> 
> Possible alternative  :  Enable multiple loggers, but when writing to disk you might have a possible race condition with filenames and dates.  Also you'll have multiple logs for each rotation interval (ex: 4 loggers means 4 conn.log, 4 http.log, 4 ssh.log, etc...)
> 
> 
> ^ Hovsep
> 

Ah, yeah, it could be that too.  Things got better for the most part once the logger node was introduced, so this hasn't been the problem for people recently.

I think most of the remaining problems with the logger node scaling are limited to extremely large log volumes and people who had AMD systems with many slow cores.. I think you had one of those.

In any case, that is easy to check for by looking at broctl top and monitoring the log lag.  If the logs are not behind, the problem is something else.

https://gist.github.com/JustinAzoff/01396a34c8f92d4dda1b


is a script for munin that will output how old the most recent record in the conn.log is.  You can just run it manually though:

[jazoff at bro-dev ~]$ curl -o log_lag.py https://gist.githubusercontent.com/JustinAzoff/01396a34c8f92d4dda1b/raw/2dba7fdf93915748948b238c20de965b4636cb9e/log_lag.py
[jazoff at bro-dev ~]$ python log_lag.py
lag.value 5.526168


The number should be 5-10s and not growing.

— 
Justin Azoff




More information about the Bro mailing list