[Bro] Bro Packet Loss / 10gb ixgbe / pf_ring

Thu Jan 7 17:03:22 PST 2016

Thanks Gary. 
  Sorry to top post, I'm stuck on OWA at the moment.  Thanks for your suggestions - here are some quick replies:

- capture_loss.bro - running it, every 15min it reports ~70% packet loss (or greater) across all of the workers
- 'adapters_to_enable' ixgbe.ko argument doesn't exist in the latest driver bundled w/pf_ring 6.2.0
- I've enabled the multi-queue stuff (MQ=1) on the 2nd interface (MQ=0,2) as well as enabled the 16 hw RSS queues = (RSS=1,16)
- I have a license in /etc/pf_ring
- bro is linked against the pf_ring enabled libpcap 
- I've confirmed that the .ko's I'm loading are the latest from pf_ring 6.2.0

Right now, pfcount says that eth3 is receiving 462Mbit/sec - I left it running for 5 minutes or so and there are zero dropped packets.  As soon as I start up bro, I'm already dropping 50%+ packets per worker. 

The only other thing I can think of could be packet duplication from some new taps that we deployed and potentially protocols that bro isn't parsing?

 -Paul

________________________________________
From: Gary Faulkner [gfaulkner.nsm at gmail.com]
Sent: Thursday, January 07, 2016 7:04 PM
To: Nash, Paul
Cc: bro at bro.org
Subject: Re: [Bro] Bro Packet Loss / 10gb ixgbe / pf_ring

Some thoughts inline...

On 1/7/16 3:37 PM, Nash, Paul wrote:
> I’m trying to debug some packet drops that I’m experiencing and am turning to the list for help.   The recorded packet loss is ~50 – 70% at times.   The packet loss is recorded in broctl’s netstats as well as in the notice.log file.
>
> Running netstats at startup – I’m dropping more than I’m receiving from the very start.

Have you tried enabling the bro capture_loss script in your local.bro as
a way to double check your loss numbers? It will give you per worker
loss on 15 minute intervals in a separate log file.

In local.bro:
@load policy/misc/capture-loss

> insmod /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/net/pf_ring/pf_ring.ko enable_tx_capture=0 min_num_slots=32768 quick_mode=1
>
> insmod  /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko numa_cpu_affinity=0,0 MQ=0,1 RSS=0,0
>
> I checked /proc/sys/pci/devices to confirm that the interface is running on numa_node 0.  ‘lscpu’ shows that cpus 0-7 are one node 0, socket 0, and cpus 8-15 are on node 1, socket 0.  I figured having the 16 RSS queues on the same socket is probably better than having them bounce around.
>
>
> The node.cfg looks like this:
>
>
> [manager]
>
> type=manager
>
> host=10.99.99.15
>
>
> #
>
> [proxy-1]
>
> type=proxy
>
> host=10.99.99.15
>
>
> #
>
> [worker-1]
>
> type=worker
>
> host=10.99.99.15
>
> interface=eth3
>
> lb_method=pf_ring
>
> lb_procs=16
>
> pin_cpus=0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
>
>
> I have a license for ZC, and if I change the interface from eth3 to zc:eth3, it will spawn up 16 workers, but only one of them is receiving any traffic.  I’m assuming that it is looking at zc:eth3 at 0 only.   Netstats proves that out.   If I run pfcount –I zc at eth3, it will show me that I’m receiving ~1gbp/s of traffic on the interface and not dropping anything.
As far as ZC usage, when using in ZC mode did you specify which adapters
to enable at the end of your ixgbe insmod statement like this -->
adapters_to_enable=<insert comma separated list of licensed mac
addresses you want to use>? Also did you try setting RSS to match the
number of workers instead of leaving it up to the NIC? Example RSS=16
instead of 0 (comma separated per NIC if more than 1 NIC). Did you try
pfcount –I zc at eth3@0 (thru 15) etc to test each RSS queue? Did you put
the necessary license files in /etc/pf_ring? Also, just to be certain,
are you using the IXGBE drivers that come with PF_RING and have you
compiled Bro against the PF_RING libpcap?

> Am I missing something obvious?  I saw many threads about disabling hyper threading, but that seems specific to intel processors – I’m running AMD operterons with their own hyper transport stuff which doesn’t create virtual cpus.

I'm not sure I understand AMD architecture well enough to know how cores
map to nodes, so I can't comment on your pinning configuration in terms
of workers per core, but assuming each worker is pinned to a physical
core and you truly have 16 physical cores on that socket, have you left
any cores unpinned somewhere else (maybe a processor in another socket),
for the system, bro manager, proxy etc to use? If not you could have
other processes stomping on your workers. If any workers are sharing
physical cores that could be problematic as well. Do you have htop or
something similar installed where you can easily watch whether processes
seem to be competing for the same physical core?

Have you tried running capstats (broctl capstats if using broctl) to see
what sort of traffic bro thinks it is seeing across all workers when you
are seeing loss? Depending on the clock speed and efficiency of each
core you may be able to process anywhere from 100-300+Mbps per core, but
if that 1Gbps of traffic was only representative of a single RSS queue
on your 10G NIC you could be oversubscribed. If you have free cores on
another socket it might be worth taking whatever small performance hit
there is over the bus to have more workers running on those other cores.
Also, I tend to leave the 1st couple logical cores open for the system
as Linux at least seems to prefer them for system use. I do tend to find
pinning workers to specific cores helps overall in the loss department
vs letting workers bounce between cores, so I think you are on the right
track.

~Gary