<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=us-ascii"><meta name=Generator content="Microsoft Word 15 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:#954F72;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0in;
        margin-right:0in;
        margin-bottom:0in;
        margin-left:.5in;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
span.EmailStyle17
        {mso-style-type:personal-compose;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:888494847;
        mso-list-type:hybrid;
        mso-list-template-ids:2137933566 67698705 67698713 67698715 67698703 67698713 67698715 67698703 67698713 67698715;}
@list l0:level1
        {mso-level-text:"%1\)";
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level2
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level3
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level4
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level5
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level6
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
@list l0:level7
        {mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level8
        {mso-level-number-format:alpha-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:left;
        text-indent:-.25in;}
@list l0:level9
        {mso-level-number-format:roman-lower;
        mso-level-tab-stop:none;
        mso-level-number-position:right;
        text-indent:-9.0pt;}
ol
        {margin-bottom:0in;}
ul
        {margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link="#0563C1" vlink="#954F72"><div class=WordSection1><p class=MsoNormal>Good morning,<o:p></o:p></p><p class=MsoNormal>I’m new to the list, and have been working on inheriting an existing zeek deployment that we have here. I’m trying to track down some (to me) excessive packet dropping.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>We had an older version of zeek (bro) installed and mostly functional, though as I recall they were having issues with workers occasionally crashing.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Before I started looking into things, a new version of zeek was deployed (from an binary) and is mostly vanilla. We’ve included the bhr and myricom plugin, but that’s about it.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Zeek master and workers run bare metal on 3 pretty big Intel hosts (192GB memory, 2x Xeon E5-2690 with 14cores/socket, Debian 9). The workers have myricom interfaces. There’s span ports at the edge that feed into Arista switches that feed the Myricom interfaces in the workers.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>We have a few issues:<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>1)<span style='font:7.0pt "Times New Roman"'> </span></span><![endif]>If I try to start up the workers with any more than ~8 threads, packet drop and memory usage goes through the roof in pretty short order. If I try to pin them, the first “worker” cpu’s get pegged pretty high and the others stay more or less idle (though that could be due to the amount of traffic the second worker interface is receiving).<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>2)<span style='font:7.0pt "Times New Roman"'> </span></span><![endif]>If I try to start up “1” worker (per worker node), using the “myricom::*” interface, the worker node goes unresponsive and needs to be hardware bounced. (Driver issue?)<o:p></o:p></p><p class=MsoListParagraph style='text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style='mso-list:Ignore'>3)<span style='font:7.0pt "Times New Roman"'> </span></span><![endif]>I can start workers nodes with multiple workers and ~5 threads each (currently “unpinned”), but after a few days, Packet drop is still excessive.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>My current node.cfg is below [1]. Output from ‘zeekctl netstats’ is also below [2]. It’s been up since Friday ~2:00pm Eastern. Load average is higher than I would think it should be (given how much cpu these workers actually have, and how idle most of the cpu’s actually are). Htop output included [3].<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>I understand we should probably be pinning the worker threads, but the output of ‘lstopo-no-graphics --of txt’ is terrible to try and trace with 56 threads available. Also, do I want to use the “P” or the “L” listings? I can include that as a follow up if necessary.<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Please help!<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[1]<o:p></o:p></p><p class=MsoNormal>================== <o:p></o:p></p><p class=MsoNormal>[manager]<o:p></o:p></p><p class=MsoNormal>type=manager<o:p></o:p></p><p class=MsoNormal>host=THE MASTER<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[logger]<o:p></o:p></p><p class=MsoNormal>type=logger<o:p></o:p></p><p class=MsoNormal>host= THE MASTER<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[proxy-1]<o:p></o:p></p><p class=MsoNormal>type=proxy<o:p></o:p></p><p class=MsoNormal>host= THE MASTER<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[worker-1]<o:p></o:p></p><p class=MsoNormal>type=worker<o:p></o:p></p><p class=MsoNormal>host=WORKER 1<o:p></o:p></p><p class=MsoNormal>lb_method=custom<o:p></o:p></p><p class=MsoNormal>lb_procs=5<o:p></o:p></p><p class=MsoNormal>interface=myricom::eth4<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[worker-2]<o:p></o:p></p><p class=MsoNormal>type=worker<o:p></o:p></p><p class=MsoNormal>host=WORKER 2<o:p></o:p></p><p class=MsoNormal>lb_method=custom<o:p></o:p></p><p class=MsoNormal>lb_procs=5<o:p></o:p></p><p class=MsoNormal>interface=myricom::eth4<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[worker-3]<o:p></o:p></p><p class=MsoNormal>type=worker<o:p></o:p></p><p class=MsoNormal>host=WORKER 1<o:p></o:p></p><p class=MsoNormal>lb_method=custom<o:p></o:p></p><p class=MsoNormal>lb_procs=5<o:p></o:p></p><p class=MsoNormal>interface=myricom::eth5<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[worker-4]<o:p></o:p></p><p class=MsoNormal>type=worker<o:p></o:p></p><p class=MsoNormal>host=WORKER 2<o:p></o:p></p><p class=MsoNormal>lb_method=custom<o:p></o:p></p><p class=MsoNormal>lb_procs=5<o:p></o:p></p><p class=MsoNormal>interface=myricom::eth5<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>=================================================<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[2]<o:p></o:p></p><p class=MsoNormal>================<o:p></o:p></p><p class=MsoNormal>bro@bro-master-1:~$ zeekctl netstats<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Warning: ZeekControl plugin uses legacy BroControl API. Use<o:p></o:p></p><p class=MsoNormal>'import ZeekControl.plugin' instead of 'import BroControl.plugin'<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal> worker-1-1: 1581949346.194441 recvd=2178149468 dropped=2260820124 link=15063051356<o:p></o:p></p><p class=MsoNormal> worker-1-2: 1581949346.194473 recvd=274557259 dropped=2260820124 link=13159459147<o:p></o:p></p><p class=MsoNormal> worker-1-3: 1581949346.168558 recvd=1888926901 dropped=2260820124 link=14773828789<o:p></o:p></p><p class=MsoNormal> worker-1-4: 1581949346.081130 recvd=2110377092 dropped=2260820124 link=14995278980<o:p></o:p></p><p class=MsoNormal> worker-1-5: 1581949346.234478 recvd=1032618510 dropped=2260820124 link=13917520398<o:p></o:p></p><p class=MsoNormal> worker-2-1: 1581949346.269794 recvd=1551167612 dropped=640636540 link=14436069500<o:p></o:p></p><p class=MsoNormal> worker-2-2: 1581949346.271224 recvd=2811566586 dropped=640636540 link=15696468474<o:p></o:p></p><p class=MsoNormal> worker-2-3: 1581949346.292474 recvd=3295536154 dropped=640636540 link=16180438042<o:p></o:p></p><p class=MsoNormal> worker-2-4: 1581949346.314556 recvd=2505663441 dropped=640636540 link=15390565329<o:p></o:p></p><p class=MsoNormal> worker-2-5: 1581949343.011855 recvd=3459004896 dropped=640636540 link=20638874080<o:p></o:p></p><p class=MsoNormal> worker-3-1: 1581949346.239424 recvd=938819819 dropped=0 link=938819819<o:p></o:p></p><p class=MsoNormal> worker-3-2: 1581949346.249540 recvd=890104345 dropped=0 link=890104345<o:p></o:p></p><p class=MsoNormal> worker-3-3: 1581949346.259501 recvd=894787204 dropped=0 link=894787204<o:p></o:p></p><p class=MsoNormal> worker-3-4: 1581949346.269501 recvd=895479546 dropped=0 link=895479546<o:p></o:p></p><p class=MsoNormal> worker-3-5: 1581949346.274490 recvd=878546610 dropped=0 link=878546610<o:p></o:p></p><p class=MsoNormal> worker-4-1: 1581949346.329587 recvd=892356780 dropped=0 link=892356780<o:p></o:p></p><p class=MsoNormal> worker-4-2: 1581949346.344510 recvd=922981664 dropped=0 link=922981664<o:p></o:p></p><p class=MsoNormal> worker-4-3: 1581949346.349568 recvd=855515132 dropped=0 link=855515132<o:p></o:p></p><p class=MsoNormal> worker-4-4: 1581949346.359652 recvd=931447757 dropped=0 link=931447757<o:p></o:p></p><p class=MsoNormal> worker-4-5: 1581949346.368349 recvd=876976485 dropped=0 link=876976485<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>===========================================================<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>[3]<o:p></o:p></p><p class=MsoNormal>===================<o:p></o:p></p><p class=MsoNormal> 1 [|| 3.3%] 15 [ 0.0%] 29 [|| 6.1%] 43 [||||||91.6%]<o:p></o:p></p><p class=MsoNormal> 2 [|| 7.9%] 16 [ 0.0%] 30 [||| 14.2%] 44 [ 0.0%]<o:p></o:p></p><p class=MsoNormal> 3 [| 3.3%] 17 [| 1.4%] 31 [|||| 20.2%] 45 [|| 1.9%]<o:p></o:p></p><p class=MsoNormal> 4 [|| 3.3%] 18 [ 0.0%] 32 [|| 4.7%] 46 [| 0.5%]<o:p></o:p></p><p class=MsoNormal> 5 [|| 3.7%] 19 [||||||76.3%] 33 [|| 4.2%] 47 [|| 9.1%]<o:p></o:p></p><p class=MsoNormal> 6 [|| 5.2%] 20 [ 0.0%] 34 [||||||39.5%] 48 [|| 3.3%]<o:p></o:p></p><p class=MsoNormal> 7 [|| 2.8%] 21 [ 0.0%] 35 [|| 5.2%] 49 [ 0.0%]<o:p></o:p></p><p class=MsoNormal> 8 [|| 5.6%] 22 [|| 1.4%] 36 [|| 3.7%] 50 [|| 3.3%]<o:p></o:p></p><p class=MsoNormal> 9 [|| 6.0%] 23 [| 0.5%] 37 [||| 16.7%] 51 [ 0.0%]<o:p></o:p></p><p class=MsoNormal> 10 [|| 1.9%] 24 [| 0.5%] 38 [||||||56.5%] 52 [|| 7.1%]<o:p></o:p></p><p class=MsoNormal> 11 [|| 2.8%] 25 [||||||88.4%] 39 [|| 6.6%] 53 [ 0.0%]<o:p></o:p></p><p class=MsoNormal> 12 [||| 13.6%] 26 [|| 1.4%] 40 [||||| 30.7%] 54 [|| 0.9%]<o:p></o:p></p><p class=MsoNormal> 13 [||| 15.2%] 27 [ 0.0%] 41 [|| 3.3%] 55 [|| 0.9%]<o:p></o:p></p><p class=MsoNormal> 14 [|| 4.8%] 28 [ 0.0%] 42 [|| 8.1%] 56 [|| 2.3%]<o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal> Mem[||||||||||||||||||||||119G/188G] <o:p></o:p></p><p class=MsoNormal> Swp[ 0K/191G] <o:p></o:p></p><p class=MsoNormal><o:p> </o:p></p><p class=MsoNormal>Tasks: 58, 107 thr; 3 running<o:p></o:p></p><p class=MsoNormal>Load average: 6.79 6.10 5.83<o:p></o:p></p><p class=MsoNormal>Uptime: 2 days, 18:59:33<o:p></o:p></p></div></body></html>