[Bro] bro master crashing
Azoff, Justin S
jazoff at illinois.edu
Mon Mar 20 11:42:26 PDT 2017
> On Mar 20, 2017, at 2:34 PM, Matt Clemons <matt.clemons at gmail.com> wrote:
> Just wanted to give an update to show how crazy this has been.
> The segfaults made me think "memory issue", so i ran memtest on the system. It has a lot of mems so this took many hours to complete, and with 0 errors. Pulled power on the system and upon boot, everything came up fine with a limited set of workers. I added all 200+ worker processes back in, and now it's running like a champ again.
> The only other thing that it could have been was a power outage on one of the 10 gig worker boxes. It kept blipping and coming back up. Bro cron was starting processes, and then that worker system was crashing due to lack of power. This could have caused the manager to fail. But i can't really tell what the root cause was.
> Thanks for the responses.
Ah.. I dropped the ball on this, sorry.
That's really interesting that a full restart fixed things. One thing I was thinking could have caused it was a stray/hung bro process somehow still listening on the port, but that usually shows up as a much more explicit issue in the logs.
It may be possible to use gdb to see where this is in the bro binary:
2017-03-09T18:34:06.409225+00:00 HOSTNAME kernel: bro: segfault at 0 ip 00000000005fcf8d sp 00007fffaf9d2f40 error 6 in bro[400000+624000]
I'm not sure if the usual method would work, but you can try
gdb `which bro`
and then at the (gdb) prompt, see if
info symbol 0x00000000005fcf8d
info symbol 0x00007fffaf9d2f40
show anything useful. There may be a more correct command to get gdb to tell you where in the bro binary the segfault occurred.
- Justin Azoff
More information about the Bro