[Zeek] Workers occasionally using 102% CPU

Thu Feb 20 15:43:55 PST 2020

Thanks, Justin, but I don't think it's an elephant flow.  I'm running snort
on the same host with the same global BPF rules (it still uses pf-ring for
clustering), which is even more sensitive to elephant flows, and it's not
spiking CPU.  Besides, it's fd16 that's triggering select, not fd11, which
is the raw socket.  The count of read calls on fd11 are pretty similar
between the good and bad strace summaries.

I don't have the workers pinned to a CPU, and there are other heavy-duty
processes on that host, so I'm not sure if perf will work for me.  I'll
read up on it and see if I can get some sort of results from it.  I'll also
strace some more and try to catch the right moment to at least figure out
what's happening around trigger time..  The load on that worker isn't so
high that strace interferes, and streaming its output through gzip sips
diskspace.

Also, it's now happening on the other node where I upgraded to v3.0.1.  I
only have 3 workers there, but load is light enough I could get by with 1
or 2.  When I kill the malfunctioning worker, it appears to respawn within
a couple minutes, which is good.  I didn't realize Zeek did that!

Any other ideas?  I can't be the only one seeing this, can I?
--
Pete
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200220/72a2594e/attachment-0001.html