[Zeek] Workers dying with "out of memory in new"

Johanna Amann johanna at corelight.com
Fri Oct 18 16:38:24 PDT 2019


Hi,

both of you are running rather old versions of Zeek.

Both 2.5.5 and 2.6.1 have a number of issues.

One of the issues that was fixed could be the cause for crashes. A bug 
could result in Zeek requesting huge allocations that cannot be 
fulfilled by the operating system; see e.g. 
https://github.com/zeek/zeek/issues/245 for more details. This specific 
issue was fixed on 2.6.3.

So - upgrading to 2.6.4 (or even better - 3.0.0) might fix those 
problems for you.

Besides that - both 2.5.5 and 2.6.1 have several vulnerabilities - and 
you really really really should upgrade them :).

Johanna

On 18 Oct 2019, at 8:26, Munroe Sollog wrote:

> For additional reference:
>
> Linux snout 4.9.0-9-amd64 #1 SMP Debian 4.9.168-1+deb9u5 (2019-08-11)
> x86_64 GNU/Linux
>
> on 10-11 I patched libssl,and libc
> on 10-17 I upgraded sudo (about 30 mins after the first worker 
> crashed)
>
> [Bro] Crash report from worker-1-12 email received at 16:00
>
> Log output from dpkg for reference:
>
> # less /var/log/dpkg.log |grep "status installed"
>
> 2019-10-11 14:59:23 status installed telegraf:amd64 1.12.3-1
>
> 2019-10-11 14:59:23 status installed libssl1.0.2:amd64 1.0.2t-1~deb9u1
>
> 2019-10-11 14:59:23 status installed libc-bin:amd64 2.24-11+deb9u4
>
> 2019-10-11 14:59:23 status installed libssl1.1:amd64 1.1.0l-1~deb9u1
>
> 2019-10-11 14:59:23 status installed openssl:amd64 1.1.0l-1~deb9u1
>
> 2019-10-11 14:59:24 status installed man-db:amd64 2.7.6.1-2
>
> 2019-10-11 14:59:24 status installed libssl1.0-dev:amd64 
> 1.0.2t-1~deb9u1
>
> 2019-10-11 14:59:24 status installed libc-bin:amd64 2.24-11+deb9u4
>
> 2019-10-17 16:25:47 status installed sudo:amd64 1.8.19p1-2.1+deb9u1
>
> 2019-10-17 16:25:47 status installed apache2-utils:amd64 
> 2.4.25-3+deb9u9
>
> 2019-10-17 16:25:47 status installed apache2-bin:amd64 2.4.25-3+deb9u9
>
> 2019-10-17 16:25:47 status installed apache2-data:all 2.4.25-3+deb9u9
>
> 2019-10-17 16:25:47 status installed systemd:amd64 232-25+deb9u12
>
> 2019-10-17 16:25:47 status installed man-db:amd64 2.7.6.1-2
>
> 2019-10-17 16:25:48 status installed apache2:amd64 2.4.25-3+deb9u9
>
>
> On Fri, Oct 18, 2019 at 11:12 AM Munroe Sollog <mus3 at lehigh.edu> 
> wrote:
>
>> Interestingly enough, we started suffering the same problem at the 
>> same
>> time.
>>
>> - 1 node with 44 cores, 256GB of RAM
>> - Zeek 2.5.5
>> - node.cfg:
>>   [worker-1]
>>
>> type=worker
>>
>> host=localhost
>>
>> interface=af_packet::ens4f0
>>
>> lb_method=custom
>>
>> lb_procs=25
>>
>> pin_cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24
>>
>>
>> - broctl.cfg:
>>
>> MemLimit = 100000000 #100GB
>>
>> setcap.enabled=1
>>
>>
>>
>> On Fri, Oct 18, 2019 at 10:48 AM Mark Gardner <mkg at vt.edu> wrote:
>>
>>> We must have crossed some threshold yesterday. Suddenly we are 
>>> suffering
>>> an epidemic of workers dying with "out of memory in new" even though 
>>> we
>>> made no changes. Previously, we would have a few die each day. Now 
>>> we have
>>> had 250 alerts of workers dying and being restarted from 00:00 to 
>>> 10:00. I
>>> have no idea where to start debugging the problem. Any suggestions?
>>>
>>> What causes a worker to die by running out of memory? The sensors 
>>> have
>>> lots of memory (see below) so I would not expect to have any out of 
>>> memory
>>> deaths. (To monitor the problem, I am in the process of setting up 
>>> collectd
>>> and graphana.)
>>>
>>> Some details:
>>> - 5 sensors, each with 16-core, AMD Epyc 7351P, 128 GB RAM, Intel 
>>> X520-T2
>>> - Zeek 2.6.1
>>> - node.cfg: lb_procs=15, pin_cpus=1-15,
>>> af_packet_buffer_size=1*1024*1024*1024
>>> - broctl.cfg: setcap enabled
>>> - Not shunting any traffic
>>>
>>> Mark
>>> --
>>> Mark Gardner
>>> --
>>> _______________________________________________
>>> Zeek mailing list
>>> zeek at zeek.org
>>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek
>>
>>
>>
>> --
>> Munroe Sollog
>> Senior Network Engineer
>> munroe at lehigh.edu
>>
>
>
> -- 
> Munroe Sollog
> Senior Network Engineer
> munroe at lehigh.edu
> _______________________________________________
> Zeek mailing list
> zeek at zeek.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/zeek


More information about the Zeek mailing list