[Bro] Bro's limitations with high worker count and memory exhaustion
jsiwek at illinois.edu
Thu Jul 2 06:55:06 PDT 2015
> On Jul 1, 2015, at 1:39 PM, Baxter Milliwew <baxter.milliwew at gmail.com> wrote:
> Do you think a high worker count with the current implementation of select() would cause high memory usage ?
> I'm trying to figure out why the manager always exhausts all memory
I’d guess the slowness of disk I/O is more of a contributing factor — a general problem is that if the manager can’t consume logs (i.e. write them to disk) at a greater or equal rate that workers produce them, then the manager buffers them up until out of memory. If you enable profiling, the prof.log would contain details to indicate whether this is the situation.
Any slowness of select() due to a large number of fds probably puts more pressure back on workers and the remote connection as it’s involved in actually pulling data off the wire and also processing some chunks pulled off the wire and those intermediate buffers have a limit before shutting down connections with peers that talk too much. This could cause thrashing of the actually connection between workers and manager (communication.log may reference stuff “filling up” or something like that), but it’s probably still possible to get in a situation where the manager’s logging thread queues still can’t dequeue entries (i.e. write to disk) fast enough and eventually hits OOM.
And if you haven’t already, take a look at reporter.log for any scripting errors as certain types may cause memory leaks.
More information about the Bro