[Bro] BROKER + CLUSTER - stuck (Mike Dopheide)
fatema.bannatwala at gmail.com
Wed Mar 8 12:39:01 PST 2017
Hmm, looks like the manager is also running with low memory:
$ free -g
total used free shared buff/cache
Mem: 70 9 46 0 14
Swap: 7 4 3
top - 15:30:04 up 47 days, 22:33, 8 users, load average: 1.26, 1.25, 1.37
Tasks: 495 total, 2 running, 491 sleeping, 2 stopped, 0 zombie
%Cpu(s): 2.9 us, 1.7 sy, 0.4 ni, 94.9 id, 0.1 wa, 0.0 hi, 0.0 si,
KiB Mem : 73949688 total, 48927592 free, 9836872 used, 15185224 buff/cache
KiB Swap: 8388600 total, 4192868 free, 4195732 used. 63369176 avail Mem
Anyways, not going into that rabbit hole :)
So the correct sequence to deploy any config changes in a cluster would be:
stop -> check -> install -> start
I was looking at the cmds available and looks like "restart --clean" would
do the trick?
or I can just script the above sequence in my restart-bro script :)
On Wed, Mar 8, 2017 at 3:20 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:
> > On Mar 8, 2017, at 2:59 PM, fatema bannatwala <
> fatema.bannatwala at gmail.com> wrote:
> > Thanks Justin for the input!
> > Yeah, you are right, tested the deploy cmd on a standalone node, and it
> does not hang there.
> > I will test out the check.bro suggestions on the prod cluster.
> > The cluster nodes use an average of ~30-35Gigs of memory (having ~125G
> in total)
> > And the capture loss also doesn't report any loss i.e 0.025% etc
> > Hence thought that the nodes were doing Ok, not sure if they are getting
> loads of traffic and hence getting overloaded.
> > Also, I have noticed that when doing a restart on the cluster, it takes
> longer now (in 2.5) than it used to take when running the old version
> > maybe the custom scripts can be the culprit, but had same scripts in the
> old version as well.
> Ah, I should have said manager not cluster.
> Check actually runs 100% on the manager. I think the hang is due to a
> race condition of some sort that prevents it from exiting like it is
> supposed to. It seems to only occur when the load is high, which is why
> deploy has an issue but stop first+check works ok.
> - Justin Azoff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Bro