[Bro] Worker System Memory Exhaustion

Greg Grasmehr greg.grasmehr at caltech.edu
Fri Apr 6 10:57:09 PDT 2018


Fast Intel CPU and the live logs write to a RAID 1 virtual disk built
on enterprise SSD drives, logs are archived to a virtual disk RAID 10
built on 15K SAS spindles.

I will give your debugging a try and see what it says.

Thanks all and have a good weekend.

Greg

On 04/06/18 14:23:40, Azoff, Justin S wrote:
> 
> > On Apr 6, 2018, at 10:07 AM, Hovsep Levi <hovsep.sanjay.levi at gmail.com> wrote:
> > 
> > This was a battle we endured for many many moons (12+ months), look to the archives for the pain and suffering. 
> > 
> > Final solution :  Enable multiple loggers (now part of Bro), disable writing logs to disk and stream logs to Kafka.  (Thank you KafkaLogger author)
> > 
> > Reasoning  :  At some point Bro's log writing cannot keep up with the volume.  Believed to be a bottleneck with the the default architecture using a single "Logger" node.
> > 
> > Possible alternative  :  Enable multiple loggers, but when writing to disk you might have a possible race condition with filenames and dates.  Also you'll have multiple logs for each rotation interval (ex: 4 loggers means 4 conn.log, 4 http.log, 4 ssh.log, etc...)
> > 
> > 
> > ^ Hovsep
> > 
> 
> Ah, yeah, it could be that too.  Things got better for the most part once the logger node was introduced, so this hasn't been the problem for people recently.
> 
> I think most of the remaining problems with the logger node scaling are limited to extremely large log volumes and people who had AMD systems with many slow cores.. I think you had one of those.
> 
> In any case, that is easy to check for by looking at broctl top and monitoring the log lag.  If the logs are not behind, the problem is something else.
> 
> https://gist.github.com/JustinAzoff/01396a34c8f92d4dda1b

> 
> is a script for munin that will output how old the most recent record in the conn.log is.  You can just run it manually though:
> 
> [jazoff at bro-dev ~]$ curl -o log_lag.py https://gist.githubusercontent.com/JustinAzoff/01396a34c8f92d4dda1b/raw/2dba7fdf93915748948b238c20de965b4636cb9e/log_lag.py
> [jazoff at bro-dev ~]$ python log_lag.py
> lag.value 5.526168
> 
> 
> The number should be 5-10s and not growing.
> 
>> Justin Azoff
> 
> 
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro



More information about the Bro mailing list