[Bro] Bro's limitations with high worker count and memory exhaustion

Baxter Milliwew baxter.milliwew at gmail.com
Tue Jun 30 11:37:30 PDT 2015


Thanks.  Some limited reading says it's not possible to increase FD_SETSIZE
on linux and it's time to migrate to poll().



On Tue, Jun 30, 2015 at 7:44 AM, Siwek, Jon <jsiwek at illinois.edu> wrote:

> A guess is that you’re bumping into an FD_SETSIZE limit — the way remote
> I/O is currently structured has at least 5 file descriptors per remote
> connection from what I can see at a glance (a pair of pipes, 2 fds each,
> for signaling read/write readiness related to ChunkedIO and one fd for the
> actual socket).  Typically, FD_SETSIZE is 1024, so with ~150-200 remote
> connections and 5 fds per connection plus whatever other descriptors Bro
> may need to have open (e.g. for file I/O), it seems reasonable to guess
> that’s the problem.  But you could easily verify w/ some code modifications
> to check whether the FD_SET call is using a fd >= FD_SETSIZE.
>
> Other than making involved code changes to Bro (e.g. to move away from
> select() for I/O event handling), the only suggestions I have are 1)
> reducing number of remote connections 2) see if you can increase FD_SETSIZE
> via preprocessor stuff or CFLAGS/CXXFLAGS upon ./configure’ing (I’ve never
> done this myself to know if it works, but I’ve googled around before and
> think the implication was that it may work on Linux).
>
> - Jon
>
> > On Jun 29, 2015, at 6:22 PM, Baxter Milliwew <baxter.milliwew at gmail.com>
> wrote:
> >
> > The manager still crashes.  Interesting note about a buffer overflow.
> >
> >
> > [manager]
> >
> > Bro 2.4
> > Linux 3.16.0-38-generic
> >
> > core
> > [New LWP 18834]
> > [Thread debugging using libthread_db enabled]
> > Using host libthread_db library
> "/lib/x86_64-linux-gnu/libthread_db.so.1".
> > Core was generated by `/usr/local/3rd-party/bro/bin/bro -U .status -p
> broctl -p broctl-live -p local -'.
> > Program terminated with signal SIGABRT, Aborted.
> > #0  0x00007f163bb46cc9 in __GI_raise (sig=sig at entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> >
> > Thread 1 (Thread 0x............ (LWP 18834)):
> > #0  0x00007f163bb46cc9 in __GI_raise (sig=sig at entry=6) at
> ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> > #1  0x00007f163bb4a0d8 in __GI_abort () at abort.c:89
> > #2  0x00007f163bb83394 in __libc_message (do_abort=do_abort at entry=2,
> fmt=fmt at entry=0x............ "*** %s ***: %s terminated\n") at
> ../sysdeps/posix/libc_fatal.c:175
> > #3  0x00007f163bc1ac9c in __GI___fortify_fail (msg=<optimized out>,
> msg at entry=0x............ "buffer overflow detected") at fortify_fail.c:37
> > #4  0x00007f163bc19b60 in __GI___chk_fail () at chk_fail.c:28
> > #5  0x00007f163bc1abe7 in __fdelt_chk (d=<optimized out>) at
> fdelt_chk.c:25
> > #6  0x00000000005e962a in Set (set=0x............, this=0x............)
> at /home/bro/Bro-IDS/bro-2.4/src/iosource/FD_Set.h:59
> > #7  SocketComm::Run (this=0x............) at
> /home/bro/Bro-IDS/bro-2.4/src/RemoteSerializer.cc:3406
> > #8  0x00000000005e9c31 in RemoteSerializer::Fork (this=0x............)
> at /home/bro/Bro-IDS/bro-2.4/src/RemoteSerializer.cc:687
> > #9  0x00000000005e9d4f in RemoteSerializer::Enable (this=0x............)
> at /home/bro/Bro-IDS/bro-2.4/src/RemoteSerializer.cc:575
> > #10 0x00000000005b6943 in BifFunc::bro_enable_communication
> (frame=<optimized out>, BiF_ARGS=<optimized out>) at bro.bif:4480
> > #11 0x00000000005b431d in BuiltinFunc::Call (this=0x............,
> args=0x............, parent=0x............) at
> /home/bro/Bro-IDS/bro-2.4/src/Func.cc:586
> > #12 0x0000000000599066 in CallExpr::Eval (this=0x............,
> f=0x............) at /home/bro/Bro-IDS/bro-2.4/src/Expr.cc:4544
> > #13 0x000000000060ceb4 in ExprStmt::Exec (this=0x............,
> f=0x............, flow=@0x............: FLOW_NEXT) at
> /home/bro/Bro-IDS/bro-2.4/src/Stmt.cc:352
> > #14 0x000000000060b174 in IfStmt::DoExec (this=0x............,
> f=0x............, v=<optimized out>, flow=@0x............: FLOW_NEXT) at
> /home/bro/Bro-IDS/bro-2.4/src/Stmt.cc:456
> > #15 0x000000000060ced1 in ExprStmt::Exec (this=0x............,
> f=0x............, flow=@0x............: FLOW_NEXT) at
> /home/bro/Bro-IDS/bro-2.4/src/Stmt.cc:356
> > #16 0x000000000060b211 in StmtList::Exec (this=0x............,
> f=0x............, flow=@0x............: FLOW_NEXT) at
> /home/bro/Bro-IDS/bro-2.4/src/Stmt.cc:1696
> > #17 0x000000000060b211 in StmtList::Exec (this=0x............,
> f=0x............, flow=@0x............: FLOW_NEXT) at
> /home/bro/Bro-IDS/bro-2.4/src/Stmt.cc:1696
> > #18 0x00000000005c042e in BroFunc::Call (this=0x............,
> args=<optimized out>, parent=0x0) at
> /home/bro/Bro-IDS/bro-2.4/src/Func.cc:403
> > #19 0x000000000057ee2a in EventHandler::Call (this=0x............,
> vl=0x............, no_remote=no_remote at entry=false) at
> /home/bro/Bro-IDS/bro-2.4/src/EventHandler.cc:130
> > #20 0x000000000057e035 in Dispatch (no_remote=false,
> this=0x............) at /home/bro/Bro-IDS/bro-2.4/src/Event.h:50
> > #21 EventMgr::Dispatch (this=this at entry=0x...... <mgr>) at
> /home/bro/Bro-IDS/bro-2.4/src/Event.cc:111
> > #22 0x000000000057e1d0 in EventMgr::Drain (this=0xbbd720 <mgr>) at
> /home/bro/Bro-IDS/bro-2.4/src/Event.cc:128
> > #23 0x00000000005300ed in main (argc=<optimized out>, argv=<optimized
> out>) at /home/bro/Bro-IDS/bro-2.4/src/main.cc:1147
> >
> >
> >
> > On Mon, Jun 29, 2015 at 4:09 PM, Baxter Milliwew <
> baxter.milliwew at gmail.com> wrote:
> > Nevermind... new box, default nofile limits.  Thanks for the malloc tip.
> >
> >
> > On Mon, Jun 29, 2015 at 4:03 PM, Baxter Milliwew <
> baxter.milliwew at gmail.com> wrote:
> > Switching to jemalloc fixed the stability issue but not the worker count
> limitation.
> >
> > On Sun, Jun 28, 2015 at 7:18 PM, Baxter Milliwew <
> baxter.milliwew at gmail.com> wrote:
> > Looks like malloc from glibc, default on Ubuntu.  I will try jemalloc
> and others.
> >
> >
> >
> > On Sun, Jun 28, 2015 at 1:03 AM, Jan Grashofer <jan.grashofer at cern.ch>
> wrote:
> > I experienced similar problems (memory gets eaten up quickly and workers
> crash with segfault) using tcmalloc. Which malloc do you use?
> >
> >
> > Regards,
> >
> > Jan
> >
> >
> > From: bro-bounces at bro.org [bro-bounces at bro.org] on behalf of Baxter
> Milliwew [baxter.milliwew at gmail.com]
> > Sent: Friday, June 26, 2015 23:03
> > To: bro at bro.org
> > Subject: [Bro] Bro's limitations with high worker count and memory
> exhaustion
> >
> > There's some sort of association between memory exhaustion and a high
> number of workers.  The poor man's fix would be to purchase new servers
> with higher CPU speeds as that would reduce the worker count.  Issues with
> high worker count and/or memory exhaustion appears to be a well know
> problem based on the mailing list archives.
> >
> > In the current version of bro-2.4 my previous configuration immediately
> causes the manager to crash: 15 proxies, 155 workers.  To resolve this I've
> lowered the count to 10 proxies and 140 workers.  However even with this
> configuration the manager process will exhaust all memory and crash within
> about 2 hours.
> >
> > The manager is threaded; I think this is an issue with the threading
> behavior between manager, proxies, and workers.  Debugging threading
> problems is complex and I'm a complete novice.. my current tutorial is
> using information from a stack overflow thread:
> >
> >
> http://stackoverflow.com/questions/981011/c-programming-debugging-with-pthreads
> >
> > Does anyone else have this problem ?  What have you tried and what do
> you suggest ?
> >
> > Thanks
> >
> >
> >
> >
> > 1435347409.458185       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] peer sent class "control"
> > 1435347409.458185       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] phase: handshake
> > 1435347409.661085       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] request for unknown event save_results
> > 1435347409.661085       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] registered for event
> Control::peer_status_response
> > 1435347409.694858       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] peer does not support 64bit PIDs; using
> compatibility mode
> > 1435347409.694858       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] peer is a Broccoli
> > 1435347409.694858       worker-2-18     parent  -       -       -
>  info    [#10000/10.1.1.1:36994] phase: running
> >
> >
> >
> >
> >
> > _______________________________________________
> > Bro mailing list
> > bro at bro-ids.org
> > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20150630/b57a07ff/attachment-0001.html 


More information about the Bro mailing list