[Zeek] Dropping packets

Joseph Fischetti Joseph.Fischetti at marist.edu
Mon Feb 17 06:41:16 PST 2020


Good morning,

I'm new to the list, and have been working on inheriting an existing zeek
deployment that we have here.  I'm trying to track down some (to me)
excessive packet dropping.

 

We had an older version of zeek (bro) installed and mostly functional,
though as I recall they were having issues with workers occasionally
crashing.

 

Before I started looking into things, a new version of zeek was deployed
(from an binary) and is mostly vanilla.  We've included the bhr and myricom
plugin, but that's about it.

 

Zeek master and workers run bare metal on 3 pretty big Intel hosts (192GB
memory, 2x Xeon E5-2690 with 14cores/socket, Debian 9).  The workers have
myricom interfaces.  There's span ports at the edge that feed into Arista
switches that feed the Myricom interfaces in the workers.

 

We have a few issues:

1)      If I try to start up the workers with any more than ~8 threads,
packet drop and memory usage goes through the roof in pretty short order.
If I try to pin them, the first "worker" cpu's get pegged pretty high and
the others stay more or less idle (though that could be due to the amount of
traffic the second worker interface is receiving).

2)      If I try to start up "1" worker (per worker node), using the
"myricom::*" interface, the worker node goes unresponsive and needs to be
hardware bounced.  (Driver issue?)

3)      I can start workers nodes with multiple workers and ~5 threads each
(currently "unpinned"), but after a few days, Packet drop is still
excessive.

 

My current node.cfg is below [1].  Output from 'zeekctl netstats' is also
below [2].  It's been up since Friday ~2:00pm Eastern.  Load average is
higher than I would think it should be (given how much cpu these workers
actually have, and how idle most of the cpu's actually are).  Htop output
included [3].

 

I understand we should probably be pinning the worker threads, but the
output of 'lstopo-no-graphics  --of txt' is terrible to try and trace with
56 threads available.  Also, do I want to use the "P" or the "L" listings?
I can include that as a follow up if necessary.

 

Please help!

 

[1]

================== 

[manager]

type=manager

host=THE MASTER

 

[logger]

type=logger

host= THE MASTER

 

[proxy-1]

type=proxy

host= THE MASTER

 

[worker-1]

type=worker

host=WORKER 1

lb_method=custom

lb_procs=5

interface=myricom::eth4

 

[worker-2]

type=worker

host=WORKER 2

lb_method=custom

lb_procs=5

interface=myricom::eth4

 

[worker-3]

type=worker

host=WORKER 1

lb_method=custom

lb_procs=5

interface=myricom::eth5

 

[worker-4]

type=worker

host=WORKER 2

lb_method=custom

lb_procs=5

interface=myricom::eth5

 

=================================================

 

[2]

================

bro at bro-master-1:~$ zeekctl netstats

 

Warning: ZeekControl plugin uses legacy BroControl API. Use

'import ZeekControl.plugin' instead of 'import BroControl.plugin'

 

worker-1-1: 1581949346.194441 recvd=2178149468 dropped=2260820124
link=15063051356

worker-1-2: 1581949346.194473 recvd=274557259 dropped=2260820124
link=13159459147

worker-1-3: 1581949346.168558 recvd=1888926901 dropped=2260820124
link=14773828789

worker-1-4: 1581949346.081130 recvd=2110377092 dropped=2260820124
link=14995278980

worker-1-5: 1581949346.234478 recvd=1032618510 dropped=2260820124
link=13917520398

worker-2-1: 1581949346.269794 recvd=1551167612 dropped=640636540
link=14436069500

worker-2-2: 1581949346.271224 recvd=2811566586 dropped=640636540
link=15696468474

worker-2-3: 1581949346.292474 recvd=3295536154 dropped=640636540
link=16180438042

worker-2-4: 1581949346.314556 recvd=2505663441 dropped=640636540
link=15390565329

worker-2-5: 1581949343.011855 recvd=3459004896 dropped=640636540
link=20638874080

worker-3-1: 1581949346.239424 recvd=938819819 dropped=0 link=938819819

worker-3-2: 1581949346.249540 recvd=890104345 dropped=0 link=890104345

worker-3-3: 1581949346.259501 recvd=894787204 dropped=0 link=894787204

worker-3-4: 1581949346.269501 recvd=895479546 dropped=0 link=895479546

worker-3-5: 1581949346.274490 recvd=878546610 dropped=0 link=878546610

worker-4-1: 1581949346.329587 recvd=892356780 dropped=0 link=892356780

worker-4-2: 1581949346.344510 recvd=922981664 dropped=0 link=922981664

worker-4-3: 1581949346.349568 recvd=855515132 dropped=0 link=855515132

worker-4-4: 1581949346.359652 recvd=931447757 dropped=0 link=931447757

worker-4-5: 1581949346.368349 recvd=876976485 dropped=0 link=876976485

 

===========================================================

 

[3]

===================

  1  [||     3.3%]    15 [       0.0%]   29 [||     6.1%]    43
[||||||91.6%]

  2  [||     7.9%]    16 [       0.0%]   30 [|||   14.2%]    44 [
0.0%]

  3  [|      3.3%]    17 [|      1.4%]   31 [||||  20.2%]    45 [||
1.9%]

  4  [||     3.3%]    18 [       0.0%]   32 [||     4.7%]    46 [|
0.5%]

  5  [||     3.7%]    19 [||||||76.3%]   33 [||     4.2%]    47 [||
9.1%]

  6  [||     5.2%]    20 [       0.0%]   34 [||||||39.5%]    48 [||
3.3%]

  7  [||     2.8%]    21 [       0.0%]   35 [||     5.2%]    49 [
0.0%]

  8  [||     5.6%]    22 [||     1.4%]   36 [||     3.7%]    50 [||
3.3%]

  9  [||     6.0%]    23 [|      0.5%]   37 [|||   16.7%]    51 [
0.0%]

  10 [||     1.9%]    24 [|      0.5%]   38 [||||||56.5%]    52 [||
7.1%]

  11 [||     2.8%]    25 [||||||88.4%]   39 [||     6.6%]    53 [
0.0%]

  12 [|||   13.6%]    26 [||     1.4%]   40 [||||| 30.7%]    54 [||
0.9%]

  13 [|||   15.2%]    27 [       0.0%]   41 [||     3.3%]    55 [||
0.9%]

  14 [||     4.8%]    28 [       0.0%]   42 [||     8.1%]    56 [||
2.3%]

 

  Mem[||||||||||||||||||||||119G/188G]   

  Swp[                        0K/191G]   

 

Tasks: 58, 107 thr; 3 running

Load average: 6.79 6.10 5.83

Uptime: 2 days, 18:59:33

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200217/a0b3166f/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 5561 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200217/a0b3166f/attachment-0001.bin 


More information about the Zeek mailing list