[Zeek] Dropping packets
Joseph K Fischetti
joseph.fischetti at marist.edu
Mon Feb 24 08:57:38 PST 2020
> I contacted myricom support and the first thing they suggested was
> turning debug logging on (via the environment variable in the
> node.cfg) so they could get some info. I did that last Wednesday
> and we’ve had none of the memory issues that we were experiencing
> before that. So at least that part of the situation is in check.
>
>
> I wouldn't have expected that to be the cause or the fix.. Did you
> load those packages I had suggested?
I didn't. I was trying to keep it as close to a vanilla configuration
as possible (especially after looping support in). I probably will this
week though.
>
>
> Here’s the packet drop stats over the last ~4.5 days:
>
>
>
> worker-1 dropped=534746334 rx=138087967089 0.39%
>
> worker-2 dropped=340526849 rx=147729484064 0.23%
>
> worker-3 dropped=0 rx=9064403485 0.00%
>
> worker-4 dropped=0 rx=9183660589 0.00%
>
>
>
> Totals dropped=875273183 rx=304065515227 0.29%
>
>
>
> Just to refresh. We’re running unpinned with 10 lb_procs per worker.
>
>
>
> Should I… give it more processes or try to pin the CPU’s? Load
> average on the workers is hovering around 5.5 and since we’re
> unpinned we’re using all 28 cores (but they’re only around 15-20%
> load).
>
>
> Load average isn't a terribly useful metric.. what really helps is
> having per cpu utilization graphs.
> Those drop numbers are already looking a lot better, but the rx
> packets per worker is still really skewed and points to something
> weird with your load balancing. The first 2 nics are seeing 15 times
> the number of packets as the other 2.
>
I would think the load average should at least indicate how the
processors are keeping up with the traffic. That said, I know there's
an imbalance. I need access to the Arista switches. I guess it's
possible that if we can balance them properly I can get away without
doing *any* more CPU/pinning work.
> I’ve read through a good portion on the Berkley 100G doc and I’m
> wondering if I should start looking at shunting as well.
>
>
> If you have enough elephant flows, it couldn't hurt.. but likely want
> to fix the load balancing first.
Will do.
Thank you
Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200224/e57ec173/attachment-0001.html
More information about the Zeek
mailing list