[Bro] Unexplained Performance Differences Between Like Servers

Fri Jun 13 06:54:51 PDT 2014

Hi Jason:

Is the type of traffic in the 600 Mbps stream similar to the type of 
traffic in the 700 Mbps stream?

Cheers,
Gilbert Clark

On 6/13/14, 9:01 AM, Jason Batchelor wrote:
> Hello everyone:
> I have Bro installed on two Dell r720s each with the following specs...
> Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x 32
> 48GB RAM
> Running: CentOs 6.5
> Both have the following PF_RING configuration:
> PF_RING Version          : 6.0.2 ($Revision: 7746$)
> Total rings              : 16
> Standard (non DNA) Options
> Ring slots               : 32768
> Slot version             : 15
> Capture TX               : No [RX only]
> IP Defragment            : No
> Socket Mode              : Standard
> Transparent mode         : Yes [mode 0]
> Total plugins            : 0
> Cluster Fragment Queue   : 1917
> Cluster Fragment Discard : 26648
> The only difference in PF Ring is the other server (Server A) is going 
> off revision 7601, where B is rev 7746.
> I've tuned the NIC to the following settings...
> ethtool -K p4p2 tso off
> ethtool -K p4p2 gro off
> ethtool -K p4p2 lro off
> ethtool -K p4p2 gso off
> ethtool -K p4p2 rx off
> ethtool -K p4p2 tx off
> ethtool -K p4p2 sg off
> ethtool -K p4p2 rxvlan off
> ethtool -K p4p2 txvlan off
> ethtool -N p4p2 rx-flow-hash udp4 sdfn
> ethtool -N p4p2 rx-flow-hash udp6 sdfn
> ethtool -n p4p2 rx-flow-hash udp6
> ethtool -n p4p2 rx-flow-hash udp4
> ethtool -C p4p2 rx-usecs 1000
> ethtool -C p4p2 adaptive-rx off
> ethtool -G p4p2 rx 4096
> I've got the following sysctl settings on each.
> # turn off selective ACK and timestamps
> net.ipv4.tcp_sack = 0
> net.ipv4.tcp_timestamps = 0
> # memory allocation min/pressure/max.
> # read buffer, write buffer, and buffer space
> net.ipv4.tcp_rmem = 10000000 10000000 10000000
> net.ipv4.tcp_wmem = 10000000 10000000 10000000
> net.ipv4.tcp_mem = 10000000 10000000 10000000
> net.core.rmem_max = 524287
> net.core.wmem_max = 524287
> net.core.rmem_default = 524287
> net.core.wmem_default = 524287
> net.core.optmem_max = 524287
> net.core.netdev_max_backlog = 300000
> Each bro configuration is using the following...
> [manager]
> type=manager
> host=localhost
> [proxy-1]
> type=proxy
> host=localhost
> [worker-1]
> type=worker
> host=localhost
> interface=p4p2
> lb_method=pf_ring
> lb_procs=16
> Both have the same NIC driver version (ixgbe):
> 3.15.1-k
> Same services installed (min install).
> Slightly different Kernel versions...
> Server A (2.6.32-431.11.2.el6.x86_64)
> Server B (2.6.32-431.17.1.el6.x86_64)
> At the moment Server A is getting about 700MB/s and Server B is 
> getting about 600Mb/s.
> What I don't understand, is Server A is having several orders of 
> magnatude better performance compared to Server B?
> TOP from A (included a few bro workers):
> top - 12:48:45 up 1 day, 17:03,  2 users,  load average: 5.30, 3.99, 3.13
> Tasks: 706 total,  19 running, 687 sleeping,   0 stopped,   0 zombie
> Cpu(s): 33.9%us,  6.6%sy,  1.1%ni, 57.2%id,  0.0%wa,  0.0%hi, 1.2%si,  
> 0.0%st
> Mem:  49376004k total, 33605828k used, 15770176k free, 93100k buffers
> Swap:  2621432k total,     9760k used,  2611672k free, 9206880k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND
>  5768 root      20   0 1808m 1.7g 519m R 100.0  3.6  32:24.92 bro
>  5760 root      20   0 1688m 1.6g 519m R 99.7  3.4  34:08.36 bro
>  3314 root      20   0 2160m 269m 4764 R 96.1  0.6  30:14.12 bro
>  5754 root      20   0 1451m 1.4g 519m R 82.8  2.9  36:40.02 bro
> TOP from B (included a few bro workers)
> top - 12:49:33 up 14:24,  2 users,  load average: 10.28, 9.31, 8.06
> Tasks: 708 total,  25 running, 683 sleeping,   0 stopped,   0 zombie
> Cpu(s): 41.6%us,  6.0%sy,  1.0%ni, 50.4%id,  0.0%wa,  0.0%hi, 1.1%si,  
> 0.0%st
> Mem:  49376004k total, 31837340k used, 17538664k free, 147212k buffers
> Swap:  2621432k total,        0k used,  2621432k free, 13494332k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM TIME+  COMMAND
>  3178 root      20   0 1073m 1.0g 264m R 100.0  2.1 401:47.31 bro
>  3188 root      20   0  881m 832m 264m R 100.0  1.7 377:48.90 bro
>  3189 root      20   0 1247m 1.2g 264m R 100.0  2.5 403:22.95 bro
>  3193 root      20   0  920m 871m 264m R 100.0  1.8 429:45.98 bro
> Both have the same amount of Bro workers. I just do not understand 
> why Server A is literally half the utilization on top of seeing more 
> traffic? The only real and consistent difference between the two I see 
> is that server A seems to have twice the amount of SHR (shared memory) 
> compared to server B.
> Could this be part of the issue, if not the root cause? How might I go 
> about rectifying the issue?
> FWIW, both are not dropping packets and doing well. However, I want to 
> run other apps on top of this, and the poor performance on Server B is 
> likely to have effects on it.
> Thanks advance for the advice!
> -Jason