Thanks Mike -
  I'm using 16 workers because the ixgbe 10gb nic support hardware receive side scaling.  16 is the max number of queues that it supports.  While trying to monitor traffic this afternoon, I was seeing ~700 - 800mb/s based on pfcount stats.

If I disabled the hardware RSS I'd have to switch over to using pf_ring standard or DNA/ZC.  I have a license for ZC, but I've been unable to figure out how to get bro to monitor all of the zc:eth3 queues. The current Bro load-balancing documentation only covers pf_ring+DNA, but not the newer/supported zero-copy functionality.  I can't find the right "interface=" configuration for node.cfg.

"interface=zc:eth3" only monitors one of the queues.
interface="zc:eth3 at 0,zc:eth3 at 1,etc.." causes the workers to crash
interface="zc:eth3 at 0 -i zc:eth3 at 1 -i .." didn't work either.

The pf_ringZC documentation implies the use of zbalance_ipc to start up a set of queues and a cluster ID, with a call to zc:## where ## is the clusterID.  I also ran into issues with that.

For tonight, I'll disable the hardware RSS and switch over to running straight pf_ring with 24 workers.  I'll pin the first 8 so that they are on the same numa node as the NIC. Not sure what to do with the other 16 workers - does anyone have any insight if it is better to pin them to the same socket? I'm on AMD, which isn't as well documented as the intel world.


Change your min_num_slots to be 65535. I would add an additional proxy as well as an additional 8 workers.

I’m trying to debug some packet drops that I’m experiencing and am turning to the list for help.   The recorded packet loss is ~50 – 70% at times.   The packet loss is recorded in broctl’s netstats as well as in the notice.log file.

Running netstats at startup – I’m dropping more than I’m receiving from the very start.

[BroControl] > netstats

 worker-1-1: 1452200459.635155 recvd=734100 dropped=1689718 link=2424079

worker-1-10: 1452200451.830143 recvd=718461 dropped=1414234 link=718461

worker-1-11: 1452200460.036766 recvd=481010 dropped=2019289 link=2500560

worker-1-12: 1452200460.239585 recvd=720895 dropped=1805574 link=2526730

worker-1-13: 1452200460.440611 recvd=753365 dropped=1800827 link=2554453

worker-1-14: 1452200460.647368 recvd=784145 dropped=1800831 link=2585237

worker-1-15: 1452200460.844842 recvd=750921 dropped=1868186 link=2619368

worker-1-16: 1452200461.049237 recvd=742718 dropped=1908528 link=2651507


System information:
- 64 AMD Opteron System
- 128gb of RAM
- Intel 10gb IXGBE interface (dual 10gb interfaces, eth3 is the sniffer)
- Licensed copy of PF_Ring ZC

I’m running Bro 2.4.1, PF_Ring 6.2.0 on Centos  / 2.6.32-411 kernel

I have the proxy, manager & 16 workers running on the same system.  16 CPUs are pinned (0-15)

Startup scripts to load the various kernel modules (from PF_RING 6.2.0 src)

insmod /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/net/pf_ring/pf_ring.ko enable_tx_capture=0 min_num_slots=32768 quick_mode=1

insmod  /lib/modules/2.6.32-431.11.2.el6.x86_64/kernel/drivers/net/ixgbe/ixgbe.ko numa_cpu_affinity=0,0 MQ=0,1 RSS=0,0

I checked /proc/sys/pci/devices to confirm that the interface is running on numa_node 0.  ‘lscpu’ shows that cpus 0-7 are one node 0, socket 0, and cpus 8-15 are on node 1, socket 0.  I figured having the 16 RSS queues on the same socket is probably better than having them bounce around.

I’ve disabled a bunch of the ixgbe offloading stuff:

ethtool -K eth3 rx off

ethtool -K eth3 tx off

ethtool -K eth3 sg off

ethtool -K eth3 tso off

ethtool -K eth3 gso off

ethtool -K eth3 gro off

ethtool -K eth3 lro off

ethtool -K eth3 rxvlan off

ethtool -K eth3 txvlan off

ethtool -K eth3 ntuple off

ethtool -K eth3 rxhash off

ethtool -K eth3 rx 32768

I’ve also tuned the stack, per recommendations from SANS:

net.ipv4.tcp_timestamps = 0

net.ipv4.tcp_sack = 0

net.ipv4.tcp_rmem = 10000000 10000000 10000000

net.ipv4.tcp_wmem = 10000000 10000000 10000000

net.ipv4.tcp_mem = 10000000 10000000 10000000

net.core.rmem_max = 134217728

net.core.wmem_max = 134217728

net.core.netdev_max_backlog = 250000

The node.cfg looks like this:
















I have a license for ZC, and if I change the interface from eth3 to zc:eth3, it will spawn up 16 workers, but only one of them is receiving any traffic.  I’m assuming that it is looking at zc:eth3 at 0 only.   Netstats proves that out.   If I run pfcount –I zc at eth3, it will show me that I’m receiving ~1gbp/s of traffic on the interface and not dropping anything.

Am I missing something obvious?  I saw many threads about disabling hyper threading, but that seems specific to intel processors – I’m running AMD operterons with their own hyper transport stuff which doesn’t create virtual cpus.


