[Zeek] Worker being "killed nohup"

Mark Gardner mkg at vt.edu
Wed Nov 6 11:10:06 PST 2019


On Wed, Nov 6, 2019 at 1:13 PM Jeff Barber <jsbarber60 at gmail.com> wrote:

> > I'm running Zeek 2.6.4 and I have been seeing occasional error messages
> of the form:
> >
> > run-bro: line 110: 42574 Killed                  nohup ${pin_command}
> $pin_cpu "$mybro" "$@"
> >
> > The workers seem to be restarted fine and other than the error message,
> I haven't noticed any ill behavior. What should I do about the error
> messages?
>
> I would check your syslog. Assuming you are running linux, if your system
> runs out of memory, the kernel will go find the biggest process and kill
> it. This can often be a zeek process as they tend to grow large as more
> connections are tracked (depending on many factors: what scripts you are
> running, what you're logging, what kind of traffic is being seen, etc.). If
> that's happening, you should see something in syslog containing the string "invoked
> oom-killer:" If you look at the surrounding lines, there should be some
> info on process sizes showing why it was selected.
>

Good catch Jeff.  A crash email arrived at 10:10 and the journal entry
starting around 10:07 contained the following:

Nov 06 10:07:08 sensor1 kernel: oom_reaper: reaped process 43437 (bro), now
anon-rss:0kB, file-rss:1048576kB, shmem-rss:0kB
Nov 06 10:07:05 sensor1 kernel: Killed process 43437 (bro)
total-vm:67116888kB, anon-rss:57339476kB, file-rss:1042296kB, shmem-rss:0kB
Nov 06 10:07:05 sensor1 kernel: Out of memory: Kill process 43437 (bro)
score 444 or sacrifice child
Nov 06 10:07:05 sensor1 kernel: [  54824]    47 54824  1516502  1221530
11284480       80             0 bro
Nov 06 10:07:05 sensor1 kernel: [  54817]    47 54817     1661       56
 49152       16             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51543]    47 51543  1516502  1281056
11886592     3616             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51542]    47 51542  1516502  1285841
11964416     2296             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51539]    47 51539  1516502  1277957
11890688     2817             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51537]    47 51537  1516502  1282107
11829248     2543             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51533]    47 51533  1516502  1276417
11939840     7591             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51532]    47 51532  1516502  1290264
12042240     8006             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51523]    47 51523  1516502  1281160
11894784     7479             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51521]    47 51521  1516502  1285423
11984896     7957             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51520]    47 51520  1745878  1370574
12668928     6908             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51501]    47 51501  1516502  1286290
11788288     6967             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51499]    47 51499     1661       22
 49152       49             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51498]    47 51498  1516502  1276891
11882496     7233             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51496]    47 51496  1516502  1287064
11948032     7685             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51491]    47 51491     1661       18
 53248       55             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51487]    47 51487     1661       18
 57344       54             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51474]    47 51474     1661       21
 49152       52             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51471]    47 51471  1516502  1277820
11890688     7964             0 bro
Nov 06 10:07:05 sensor1 kernel: [  51465]    47 51465     1661       20
 53248       53             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51455]    47 51455     1661       23
 57344       49             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51440]    47 51440     1661       18
 49152       54             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51438]    47 51438     1661       23
 49152       49             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51428]    47 51428     1661       18
 49152       54             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51414]    47 51414     1661       16
 53248       56             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51405]    47 51405     1661       20
 53248       52             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51397]    47 51397     1661       14
 45056       59             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  51375]    47 51375     1661       15
 49152       58             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  43437]    47 43437 16779222 14595446
122212352   129972             0 bro
Nov 06 10:07:05 sensor1 kernel: [  43343]    47 43343     1661        2
 49152       70             0 run-bro
Nov 06 10:07:05 sensor1 kernel: [  43183]    47 43183    42625        4
106496      608             0 (sd-pam)
Nov 06 10:07:05 sensor1 kernel: [  43181]    47 43181     5290        0
 77824      342             0 systemd
Nov 06 10:07:05 sensor1 kernel: [    996]     0   996     1346        0
 49152       33             0 agetty
Nov 06 10:07:05 sensor1 kernel: [    995]     0   995     1403        0
 45056       32             0 agetty
Nov 06 10:07:05 sensor1 kernel: [    994]     0   994     3963       38
 69632      178         -1000 sshd
Nov 06 10:07:05 sensor1 kernel: [    991]     0   991     2119       20
 53248       35             0 cron
Nov 06 10:07:05 sensor1 kernel: [    990]   105   990     2210       97
 57344       55          -900 dbus-daemon
Nov 06 10:07:05 sensor1 kernel: [    977]     0   977    56456       90
 86016      378             0 rsyslogd
Nov 06 10:07:05 sensor1 kernel: [    968]     0   968     4875       78
 77824      189             0 systemd-logind
Nov 06 10:07:05 sensor1 kernel: [    944]     0   944     2021        1
 57344      799             0 haveged
Nov 06 10:07:05 sensor1 kernel: [    931]   100   931    23270       26
 86016      207             0 systemd-timesyn
Nov 06 10:07:05 sensor1 kernel: [    640]     0   640     5579       14
 69632      309         -1000 systemd-udevd
Nov 06 10:07:05 sensor1 kernel: [    615]     0   615    74225     2624
638976      568             0 systemd-journal
Nov 06 10:07:05 sensor1 kernel: [  pid  ]   uid  tgid total_vm      rss
pgtables_bytes swapents oom_score_adj name
Nov 06 10:07:05 sensor1 kernel: Tasks state (memory values in pages):
Nov 06 10:07:05 sensor1 kernel: 0 pages hwpoisoned
Nov 06 10:07:05 sensor1 kernel: 551979 pages reserved
Nov 06 10:07:05 sensor1 kernel: 0 pages HighMem/MovableOnly
Nov 06 10:07:05 sensor1 kernel: 33528566 pages RAM
Nov 06 10:07:05 sensor1 kernel: Total swap = 976892kB
Nov 06 10:07:05 sensor1 kernel: Free swap  = 0kB
Nov 06 10:07:05 sensor1 kernel: Swap cache stats: add 759161, delete
711025, find 25951/48832
Nov 06 10:07:05 sensor1 kernel: 48128 pages in swap cache
Nov 06 10:07:05 sensor1 kernel: 52774 total pagecache pages
Nov 06 10:07:05 sensor1 kernel: Node 3 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
Nov 06 10:07:05 sensor1 kernel: Node 3 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
Nov 06 10:07:05 sensor1 kernel: Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=2048kB
Nov 06 10:07:05 sensor1 kernel: Node 1 hugepages_total=0 hugepages_free=0
hugepages_surp=0 hugepages_size=1048576kB
Nov 06 10:07:05 sensor1 kernel: Node 3 Normal: 73*4kB (U) 150*8kB (UH)
59*16kB (UH) 1325*32kB (UH) 5*64kB (H) 2*128kB (H) 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 45412kB
Nov 06 10:07:05 sensor1 kernel: Node 1 Normal: 78*4kB (UMEH) 213*8kB (UEH)
70*16kB (UH) 1232*32kB (UMH) 7*64kB (H) 0*128kB 0*256kB 0*512kB 0*1024kB
0*2048kB 0*4096kB = 43008kB
Nov 06 10:07:05 sensor1 kernel: Node 1 DMA32: 697*4kB (UME) 647*8kB (UME)
392*16kB (UME) 190*32kB (UME) 94*64kB (UE) 51*128kB (UE) 63*256kB (UME)
240*512kB (UME) 72*1024kB (UME) 0*2048kB 0*4096kB = 245596kB
Nov 06 10:07:05 sensor1 kernel: Node 1 DMA: 1*4kB (U) 1*8kB (U) 2*16kB (U)
1*32kB (U) 3*64kB (U) 0*128kB 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M)
3*4096kB (M) = 15884kB
Nov 06 10:07:05 sensor1 kernel: lowmem_reserve[]: 0 0 0 0 0
Nov 06 10:07:05 sensor1 kernel: Node 3 Normal free:44844kB min:45116kB
low:111144kB high:177172kB active_anon:48629496kB inactive_anon:3250728kB
active_file:64kB inactive_file:668kB unevictable:0kB writepending:0kB
present:67094528kB managed:66031300kB mlocked:0kB kernel_stack:5352kB
pagetables:146976kB bounce
Nov 06 10:07:05 sensor1 kernel: lowmem_reserve[]: 0 0 0 0 0
Nov 06 10:07:05 sensor1 kernel: Node 1 Normal free:41932kB min:42604kB
low:104956kB high:167308kB active_anon:55633904kB inactive_anon:4082748kB
active_file:560kB inactive_file:0kB unevictable:0kB writepending:0kB
present:63438848kB managed:62359848kB mlocked:0kB kernel_stack:4776kB
pagetables:133008kB bounce:
Nov 06 10:07:05 sensor1 kernel: lowmem_reserve[]: 0 0 60894 60894 60894
Nov 06 10:07:05 sensor1 kernel: Node 1 DMA32 free:245356kB min:2372kB
low:5844kB high:9316kB active_anon:3199852kB inactive_anon:44556kB
active_file:100kB inactive_file:0kB unevictable:0kB writepending:0kB
present:3564892kB managed:3499316kB mlocked:0kB kernel_stack:48kB
pagetables:3092kB bounce:0kB free_pcp:4
Nov 06 10:07:05 sensor1 kernel: lowmem_reserve[]: 0 3392 64286 64286 64286
Nov 06 10:07:05 sensor1 kernel: Node 1 DMA free:15884kB min:8kB low:20kB
high:32kB active_anon:0kB inactive_anon:0kB active_file:0kB
inactive_file:0kB unevictable:0kB writepending:0kB present:15996kB
managed:15884kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB
free_pcp:0kB local_pcp:0kB free_cma:0kB
Nov 06 10:07:05 sensor1 kernel: Node 3 active_anon:48629496kB
inactive_anon:3250728kB active_file:108kB inactive_file:700kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:13636088kB
dirty:0kB writeback:0kB shmem:9996kB shmem_thp: 0kB shmem_pmdmapped: 0kB
anon_thp: 1337344kB writeback_tmp:0kB unst
Nov 06 10:07:05 sensor1 kernel: Node 1 active_anon:58833756kB
inactive_anon:4127304kB active_file:548kB inactive_file:0kB unevictable:0kB
isolated(anon):0kB isolated(file):0kB mapped:2101152kB dirty:0kB
writeback:0kB shmem:7828kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp:
13608960kB writeback_tmp:0kB unstab
Nov 06 10:07:05 sensor1 kernel: active_anon:26865813 inactive_anon:1844508
isolated_anon:0
                                 active_file:163 inactive_file:0
isolated_file:0
                                 unevictable:0 dirty:0 writeback:0
unstable:0
                                 slab_reclaimable:25057
slab_unreclaimable:65745
                                 mapped:3934289 shmem:4456 pagetables:70769
bounce:0
                                 free:87073 free_pcp:3080 free_cma:0
Nov 06 10:07:05 sensor1 kernel: Mem-Info:
Nov 06 10:07:05 sensor1 kernel: R13: 0000000000001800 R14: 00007f375bd4f780
R15: 0000000000000022
Nov 06 10:07:05 sensor1 kernel: R10: 00007f3618971000 R11: 00007f36189727e0
R12: 00007f375bd4f940
Nov 06 10:07:05 sensor1 kernel: RBP: 00007f361424b000 R08: 0000000000000000
R09: 00007f36189727e0
Nov 06 10:07:05 sensor1 kernel: RDX: 0000000000001780 RSI: 00007f361424b000
RDI: 00007f3618971000
Nov 06 10:07:05 sensor1 kernel: RAX: 00007f3618971000 RBX: 0000000000001800
RCX: 00007f361424c760
Nov 06 10:07:05 sensor1 kernel: RSP: 002b:00007ffd609fbc58 EFLAGS: 00010202
Nov 06 10:07:05 sensor1 kernel: Code: Bad RIP value.
Nov 06 10:07:05 sensor1 kernel: RIP: 0033:0x7f375bf1f925
Nov 06 10:07:05 sensor1 kernel:  page_fault+0x1e/0x30
Nov 06 10:07:05 sensor1 kernel:  ? page_fault+0x8/0x30
Nov 06 10:07:05 sensor1 kernel:  __do_page_fault+0x249/0x4f0
Nov 06 10:07:05 sensor1 kernel:  handle_mm_fault+0xd6/0x200
Nov 06 10:07:05 sensor1 kernel:  ? __switch_to_asm+0x41/0x70
Nov 06 10:07:05 sensor1 kernel:  __handle_mm_fault+0x958/0x1270
Nov 06 10:07:05 sensor1 kernel:  alloc_pages_vma+0x74/0x1c0
Nov 06 10:07:05 sensor1 kernel:  __alloc_pages_nodemask+0x28b/0x2b0
Nov 06 10:07:05 sensor1 kernel:  __alloc_pages_slowpath+0xbd8/0xcb0
Nov 06 10:07:05 sensor1 kernel:  out_of_memory+0x1a5/0x430
Nov 06 10:07:05 sensor1 kernel:  oom_kill_process.cold.30+0xb/0x1cf
Nov 06 10:07:05 sensor1 kernel:  dump_header+0x6b/0x283
Nov 06 10:07:05 sensor1 kernel:  dump_stack+0x5c/0x80
Nov 06 10:07:05 sensor1 kernel: Call Trace:
Nov 06 10:07:05 sensor1 kernel: Hardware name: Supermicro Super
Server/H11SSL-i, BIOS 1.0b 04/27/2018
Nov 06 10:07:05 sensor1 kernel: CPU: 3 PID: 51496 Comm: bro Not tainted
4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u1
Nov 06 10:07:05 sensor1 kernel: bro cpuset=/ mems_allowed=1,3
Nov 06 10:07:05 sensor1 kernel: bro invoked oom-killer:
gfp_mask=0x6280ca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO), nodemask=(null),
order=0, oom_score_adj=0

It looks like bro was using 64 GB of RAM (summation of even though the
system has 128 GB installed. (It also looks like there was no free swap.)
Searching the Internet for ways to solve OOM, people suggested turning off
over-commit, and configuring the overcommit ratio a bit higher. Currently,
the values are:

# sysctl vm.overcommit_memory
vm.overcommit_memory = 0
# sysctl vm.overcommit_ratio
vm.overcommit_ratio = 50

In one case the suggestion was to change those to vm.overcommit_memory = 1
and vm.overcommit_ratio = 80 which seems reasonable based on the
description at
https://www.kernel.org/doc/Documentation/vm/overcommit-accounting.

Or is there a better way to handle OOM?

Mark
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20191106/0410889d/attachment-0001.html 


More information about the Zeek mailing list