[Bro] Worker System Memory Exhaustion
Greg Grasmehr
greg.grasmehr at caltech.edu
Fri Apr 6 10:57:09 PDT 2018
Fast Intel CPU and the live logs write to a RAID 1 virtual disk built
on enterprise SSD drives, logs are archived to a virtual disk RAID 10
built on 15K SAS spindles.
I will give your debugging a try and see what it says.
Thanks all and have a good weekend.
Greg
On 04/06/18 14:23:40, Azoff, Justin S wrote:
>
> > On Apr 6, 2018, at 10:07 AM, Hovsep Levi <hovsep.sanjay.levi at gmail.com> wrote:
> >
> > This was a battle we endured for many many moons (12+ months), look to the archives for the pain and suffering.
> >
> > Final solution : Enable multiple loggers (now part of Bro), disable writing logs to disk and stream logs to Kafka. (Thank you KafkaLogger author)
> >
> > Reasoning : At some point Bro's log writing cannot keep up with the volume. Believed to be a bottleneck with the the default architecture using a single "Logger" node.
> >
> > Possible alternative : Enable multiple loggers, but when writing to disk you might have a possible race condition with filenames and dates. Also you'll have multiple logs for each rotation interval (ex: 4 loggers means 4 conn.log, 4 http.log, 4 ssh.log, etc...)
> >
> >
> > ^ Hovsep
> >
>
> Ah, yeah, it could be that too. Things got better for the most part once the logger node was introduced, so this hasn't been the problem for people recently.
>
> I think most of the remaining problems with the logger node scaling are limited to extremely large log volumes and people who had AMD systems with many slow cores.. I think you had one of those.
>
> In any case, that is easy to check for by looking at broctl top and monitoring the log lag. If the logs are not behind, the problem is something else.
>
> https://gist.github.com/JustinAzoff/01396a34c8f92d4dda1b
>
> is a script for munin that will output how old the most recent record in the conn.log is. You can just run it manually though:
>
> [jazoff at bro-dev ~]$ curl -o log_lag.py https://gist.githubusercontent.com/JustinAzoff/01396a34c8f92d4dda1b/raw/2dba7fdf93915748948b238c20de965b4636cb9e/log_lag.py
> [jazoff at bro-dev ~]$ python log_lag.py
> lag.value 5.526168
>
>
> The number should be 5-10s and not growing.
>
> —
> Justin Azoff
>
>
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
More information about the Bro
mailing list