[Zeek] Dropping packets

Mon Feb 24 07:55:21 PST 2020

On Mon, Feb 24, 2020 at 8:35 AM Joseph Fischetti <
Joseph.Fischetti at marist.edu> wrote:

> I contacted myricom support and the first thing they suggested was turning
> debug logging on (via the environment variable in the node.cfg) so they
> could get some info. I did that last Wednesday and we’ve had none of the
> memory issues that we were experiencing before that.  So at least that part
> of the situation is in check.
>

I wouldn't have expected that to be the cause or the fix.. Did you load
those packages I had suggested?

> Here’s the packet drop stats over the last ~4.5 days:
>
>
>
> worker-1 dropped=534746334 rx=138087967089 0.39%
>
> worker-2 dropped=340526849 rx=147729484064 0.23%
>
> worker-3 dropped=0 rx=9064403485 0.00%
>
> worker-4 dropped=0 rx=9183660589 0.00%
>
>
>
> Totals dropped=875273183 rx=304065515227 0.29%
>
>
>
> Just to refresh.  We’re running unpinned with 10 lb_procs per worker.
>
>
>
> Should I… give it more processes or try to pin the CPU’s?  Load average on
> the workers is hovering around 5.5 and since we’re unpinned we’re using all
> 28 cores (but they’re only around 15-20% load).
>

Load average isn't a terribly useful metric.. what really helps is having
per cpu utilization graphs.
Those drop numbers are already looking a lot better, but the rx packets per
worker is still really skewed and points to something weird with your load
balancing.  The first 2 nics are seeing 15 times the number of packets as
the other 2.

> I’ve read through a good portion on the Berkley 100G doc and I’m wondering
> if I should start looking at shunting as well.
>

If you have enough elephant flows, it couldn't hurt.. but likely want to
fix the load balancing first.
-- 
Justin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200224/e0800c1f/attachment.html