[Zeek] Dropping packets

Mon Feb 24 08:57:38 PST 2020

>     I contacted myricom support and the first thing they suggested was
>     turning debug logging on (via the environment variable in the
>     node.cfg) so they could get some info. I did that last Wednesday
>     and we’ve had none of the memory issues that we were experiencing
>     before that.  So at least that part of the situation is in check.
>
>
> I wouldn't have expected that to be the cause or the fix.. Did you
> load those packages I had suggested?  

I didn't.  I was trying to keep it as close to a vanilla configuration
as possible (especially after looping support in).  I probably will this
week though.

>  
>
>     Here’s the packet drop stats over the last ~4.5 days:
>
>      
>
>     worker-1 dropped=534746334 rx=138087967089 0.39%
>
>     worker-2 dropped=340526849 rx=147729484064 0.23%
>
>     worker-3 dropped=0 rx=9064403485 0.00%
>
>     worker-4 dropped=0 rx=9183660589 0.00%
>
>      
>
>     Totals dropped=875273183 rx=304065515227 0.29%
>
>      
>
>     Just to refresh.  We’re running unpinned with 10 lb_procs per worker.
>
>      
>
>     Should I… give it more processes or try to pin the CPU’s?  Load
>     average on the workers is hovering around 5.5 and since we’re
>     unpinned we’re using all 28 cores (but they’re only around 15-20%
>     load).
>
>
> Load average isn't a terribly useful metric.. what really helps is
> having per cpu utilization graphs.
> Those drop numbers are already looking a lot better, but the rx
> packets per worker is still really skewed and points to something
> weird with your load balancing.  The first 2 nics are seeing 15 times
> the number of packets as the other 2.
>  

I would think the load average should at least indicate how the
processors are keeping up with the traffic.  That said, I know there's
an imbalance.  I need access to the Arista switches.  I guess it's
possible that if we can balance them properly I can get away without
doing *any* more CPU/pinning work.

>     I’ve read through a good portion on the Berkley 100G doc and I’m
>     wondering if I should start looking at shunting as well. 
>
>
> If you have enough elephant flows, it couldn't hurt.. but likely want
> to fix the load balancing first.

Will do.

Thank you

Joe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/zeek/attachments/20200224/e57ec173/attachment-0001.html