[Zeek] Workers dying with "out of memory in new"

Fri Oct 18 07:47:21 PDT 2019

We must have crossed some threshold yesterday. Suddenly we are suffering an
epidemic of workers dying with "out of memory in new" even though we made
no changes. Previously, we would have a few die each day. Now we have had
250 alerts of workers dying and being restarted from 00:00 to 10:00. I have
no idea where to start debugging the problem. Any suggestions?

What causes a worker to die by running out of memory? The sensors have lots
of memory (see below) so I would not expect to have any out of memory
deaths. (To monitor the problem, I am in the process of setting up collectd
and graphana.)

Some details:
- 5 sensors, each with 16-core, AMD Epyc 7351P, 128 GB RAM, Intel X520-T2
- Zeek 2.6.1
- node.cfg: lb_procs=15, pin_cpus=1-15,
af_packet_buffer_size=1*1024*1024*1024
- broctl.cfg: setcap enabled
- Not shunting any traffic

Mark
-- 
Mark Gardner
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20191018/4986cef0/attachment.html