[Bro] BROKER + CLUSTER - stuck (Mike Dopheide)

Wed Mar 8 12:39:01 PST 2017

Hmm, looks like the manager is also running with low memory:
$ free -g
              total        used        free      shared  buff/cache
available
Mem:             70           9          46           0          14
 60
Swap:             7           4           3

$ top
top - 15:30:04 up 47 days, 22:33,  8 users,  load average: 1.26, 1.25, 1.37
Tasks: 495 total,   2 running, 491 sleeping,   2 stopped,   0 zombie
%Cpu(s):  2.9 us,  1.7 sy,  0.4 ni, 94.9 id,  0.1 wa,  0.0 hi,  0.0 si,
 0.0 st
KiB Mem : 73949688 total, 48927592 free,  9836872 used, 15185224 buff/cache
KiB Swap:  8388600 total,  4192868 free,  4195732 used. 63369176 avail Mem

Anyways, not going into that rabbit hole :)
So the correct sequence to deploy any config changes in a cluster would be:
stop -> check -> install -> start
I was looking at the cmds available and looks like "restart --clean" would
do the trick?
or I can just script the above sequence in my restart-bro script :)

Thanks,
Fatema.

On Wed, Mar 8, 2017 at 3:20 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:

>
> > On Mar 8, 2017, at 2:59 PM, fatema bannatwala <
> fatema.bannatwala at gmail.com> wrote:
> >
> > Thanks Justin for the input!
> > Yeah, you are right, tested the deploy cmd on a standalone node, and it
> does not hang there.
> > I will test out the check.bro suggestions on the prod cluster.
> >
> > The cluster nodes use an average of ~30-35Gigs of memory (having ~125G
> in total)
> > And the capture loss also doesn't report any loss i.e 0.025% etc
> > Hence thought that the nodes were doing Ok, not sure if they are getting
> loads of traffic and hence getting overloaded.
> >
> > Also, I have noticed that when doing a restart on the cluster, it takes
> longer now (in 2.5) than it used to take when running the old version
> (2.4.1),
> > maybe the custom scripts can be the culprit, but had same scripts in the
> old version as well.
> >
> Ah, I should have said manager not cluster.
>
> Check actually runs 100% on the manager.  I think the hang is due to a
> race condition of some sort that prevents it from exiting like it is
> supposed to.  It seems to only occur when the load is high, which is why
> deploy has an issue but stop first+check works ok.
>
> --
> - Justin Azoff
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170308/771ee153/attachment.html