[Zeek] Dropping packets
Joseph Fischetti
Joseph.Fischetti at marist.edu
Mon Feb 17 06:41:16 PST 2020
Good morning,
I'm new to the list, and have been working on inheriting an existing zeek
deployment that we have here. I'm trying to track down some (to me)
excessive packet dropping.
We had an older version of zeek (bro) installed and mostly functional,
though as I recall they were having issues with workers occasionally
crashing.
Before I started looking into things, a new version of zeek was deployed
(from an binary) and is mostly vanilla. We've included the bhr and myricom
plugin, but that's about it.
Zeek master and workers run bare metal on 3 pretty big Intel hosts (192GB
memory, 2x Xeon E5-2690 with 14cores/socket, Debian 9). The workers have
myricom interfaces. There's span ports at the edge that feed into Arista
switches that feed the Myricom interfaces in the workers.
We have a few issues:
1) If I try to start up the workers with any more than ~8 threads,
packet drop and memory usage goes through the roof in pretty short order.
If I try to pin them, the first "worker" cpu's get pegged pretty high and
the others stay more or less idle (though that could be due to the amount of
traffic the second worker interface is receiving).
2) If I try to start up "1" worker (per worker node), using the
"myricom::*" interface, the worker node goes unresponsive and needs to be
hardware bounced. (Driver issue?)
3) I can start workers nodes with multiple workers and ~5 threads each
(currently "unpinned"), but after a few days, Packet drop is still
excessive.
My current node.cfg is below [1]. Output from 'zeekctl netstats' is also
below [2]. It's been up since Friday ~2:00pm Eastern. Load average is
higher than I would think it should be (given how much cpu these workers
actually have, and how idle most of the cpu's actually are). Htop output
included [3].
I understand we should probably be pinning the worker threads, but the
output of 'lstopo-no-graphics --of txt' is terrible to try and trace with
56 threads available. Also, do I want to use the "P" or the "L" listings?
I can include that as a follow up if necessary.
Please help!
[1]
==================
[manager]
type=manager
host=THE MASTER
[logger]
type=logger
host= THE MASTER
[proxy-1]
type=proxy
host= THE MASTER
[worker-1]
type=worker
host=WORKER 1
lb_method=custom
lb_procs=5
interface=myricom::eth4
[worker-2]
type=worker
host=WORKER 2
lb_method=custom
lb_procs=5
interface=myricom::eth4
[worker-3]
type=worker
host=WORKER 1
lb_method=custom
lb_procs=5
interface=myricom::eth5
[worker-4]
type=worker
host=WORKER 2
lb_method=custom
lb_procs=5
interface=myricom::eth5
=================================================
[2]
================
bro at bro-master-1:~$ zeekctl netstats
Warning: ZeekControl plugin uses legacy BroControl API. Use
'import ZeekControl.plugin' instead of 'import BroControl.plugin'
worker-1-1: 1581949346.194441 recvd=2178149468 dropped=2260820124
link=15063051356
worker-1-2: 1581949346.194473 recvd=274557259 dropped=2260820124
link=13159459147
worker-1-3: 1581949346.168558 recvd=1888926901 dropped=2260820124
link=14773828789
worker-1-4: 1581949346.081130 recvd=2110377092 dropped=2260820124
link=14995278980
worker-1-5: 1581949346.234478 recvd=1032618510 dropped=2260820124
link=13917520398
worker-2-1: 1581949346.269794 recvd=1551167612 dropped=640636540
link=14436069500
worker-2-2: 1581949346.271224 recvd=2811566586 dropped=640636540
link=15696468474
worker-2-3: 1581949346.292474 recvd=3295536154 dropped=640636540
link=16180438042
worker-2-4: 1581949346.314556 recvd=2505663441 dropped=640636540
link=15390565329
worker-2-5: 1581949343.011855 recvd=3459004896 dropped=640636540
link=20638874080
worker-3-1: 1581949346.239424 recvd=938819819 dropped=0 link=938819819
worker-3-2: 1581949346.249540 recvd=890104345 dropped=0 link=890104345
worker-3-3: 1581949346.259501 recvd=894787204 dropped=0 link=894787204
worker-3-4: 1581949346.269501 recvd=895479546 dropped=0 link=895479546
worker-3-5: 1581949346.274490 recvd=878546610 dropped=0 link=878546610
worker-4-1: 1581949346.329587 recvd=892356780 dropped=0 link=892356780
worker-4-2: 1581949346.344510 recvd=922981664 dropped=0 link=922981664
worker-4-3: 1581949346.349568 recvd=855515132 dropped=0 link=855515132
worker-4-4: 1581949346.359652 recvd=931447757 dropped=0 link=931447757
worker-4-5: 1581949346.368349 recvd=876976485 dropped=0 link=876976485
===========================================================
[3]
===================
1 [|| 3.3%] 15 [ 0.0%] 29 [|| 6.1%] 43
[||||||91.6%]
2 [|| 7.9%] 16 [ 0.0%] 30 [||| 14.2%] 44 [
0.0%]
3 [| 3.3%] 17 [| 1.4%] 31 [|||| 20.2%] 45 [||
1.9%]
4 [|| 3.3%] 18 [ 0.0%] 32 [|| 4.7%] 46 [|
0.5%]
5 [|| 3.7%] 19 [||||||76.3%] 33 [|| 4.2%] 47 [||
9.1%]
6 [|| 5.2%] 20 [ 0.0%] 34 [||||||39.5%] 48 [||
3.3%]
7 [|| 2.8%] 21 [ 0.0%] 35 [|| 5.2%] 49 [
0.0%]
8 [|| 5.6%] 22 [|| 1.4%] 36 [|| 3.7%] 50 [||
3.3%]
9 [|| 6.0%] 23 [| 0.5%] 37 [||| 16.7%] 51 [
0.0%]
10 [|| 1.9%] 24 [| 0.5%] 38 [||||||56.5%] 52 [||
7.1%]
11 [|| 2.8%] 25 [||||||88.4%] 39 [|| 6.6%] 53 [
0.0%]
12 [||| 13.6%] 26 [|| 1.4%] 40 [||||| 30.7%] 54 [||
0.9%]
13 [||| 15.2%] 27 [ 0.0%] 41 [|| 3.3%] 55 [||
0.9%]
14 [|| 4.8%] 28 [ 0.0%] 42 [|| 8.1%] 56 [||
2.3%]
Mem[||||||||||||||||||||||119G/188G]
Swp[ 0K/191G]
Tasks: 58, 107 thr; 3 running
Load average: 6.79 6.10 5.83
Uptime: 2 days, 18:59:33
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200217/a0b3166f/attachment-0001.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5561 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200217/a0b3166f/attachment-0001.bin
More information about the Zeek
mailing list