[Bro] Unexplained Performance Differences Between Like Servers

John Hoyt john.h.hoyt at gmail.com
Fri Jun 13 06:49:33 PDT 2014


Hey Jason,

What versions of Bro, and it is the same for both?  I had some serious
resource issues from one of the Beta versions recently, and switched back
to the stable version.

-John


On Fri, Jun 13, 2014 at 9:01 AM, Jason Batchelor <jxbatchelor at gmail.com>
wrote:

> Hello everyone:
>
> I have Bro installed on two Dell r720s each with the following specs...
>
> Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x 32
> 48GB RAM
>
> Running: CentOs 6.5
>
> Both have the following PF_RING configuration:
>
> PF_RING Version          : 6.0.2 ($Revision: 7746$)
> Total rings              : 16
> Standard (non DNA) Options
> Ring slots               : 32768
> Slot version             : 15
> Capture TX               : No [RX only]
> IP Defragment            : No
> Socket Mode              : Standard
> Transparent mode         : Yes [mode 0]
> Total plugins            : 0
> Cluster Fragment Queue   : 1917
> Cluster Fragment Discard : 26648
> The only difference in PF Ring is the other server (Server A) is going off
> revision 7601, where B is rev 7746.
>
> I've tuned the NIC to the following settings...
>
> ethtool -K p4p2 tso off
> ethtool -K p4p2 gro off
> ethtool -K p4p2 lro off
> ethtool -K p4p2 gso off
> ethtool -K p4p2 rx off
> ethtool -K p4p2 tx off
> ethtool -K p4p2 sg off
> ethtool -K p4p2 rxvlan off
> ethtool -K p4p2 txvlan off
> ethtool -N p4p2 rx-flow-hash udp4 sdfn
> ethtool -N p4p2 rx-flow-hash udp6 sdfn
> ethtool -n p4p2 rx-flow-hash udp6
> ethtool -n p4p2 rx-flow-hash udp4
> ethtool -C p4p2 rx-usecs 1000
> ethtool -C p4p2 adaptive-rx off
> ethtool -G p4p2 rx 4096
> I've got the following sysctl settings on each.
>
> # turn off selective ACK and timestamps
> net.ipv4.tcp_sack = 0
> net.ipv4.tcp_timestamps = 0
> # memory allocation min/pressure/max.
> # read buffer, write buffer, and buffer space
> net.ipv4.tcp_rmem = 10000000 10000000 10000000
> net.ipv4.tcp_wmem = 10000000 10000000 10000000
> net.ipv4.tcp_mem = 10000000 10000000 10000000
> net.core.rmem_max = 524287
> net.core.wmem_max = 524287
> net.core.rmem_default = 524287
> net.core.wmem_default = 524287
> net.core.optmem_max = 524287
> net.core.netdev_max_backlog = 300000
> Each bro configuration is using the following...
>
> [manager]
> type=manager
> host=localhost
> [proxy-1]
> type=proxy
> host=localhost
> [worker-1]
> type=worker
> host=localhost
> interface=p4p2
> lb_method=pf_ring
> lb_procs=16
> Both have the same NIC driver version (ixgbe):
> 3.15.1-k
>
> Same services installed (min install).
>
> Slightly different Kernel versions...
> Server A (2.6.32-431.11.2.el6.x86_64)
> Server B (2.6.32-431.17.1.el6.x86_64)
>
>
> At the moment Server A is getting about 700MB/s and Server B is getting
> about 600Mb/s.
>
> What I don't understand, is Server A is having several orders of magnatude
> better performance compared to Server B?
>
> TOP from A (included a few bro workers):
>
> top - 12:48:45 up 1 day, 17:03,  2 users,  load average: 5.30, 3.99, 3.13
> Tasks: 706 total,  19 running, 687 sleeping,   0 stopped,   0 zombie
> Cpu(s): 33.9%us,  6.6%sy,  1.1%ni, 57.2%id,  0.0%wa,  0.0%hi,  1.2%si,
> 0.0%st
> Mem:  49376004k total, 33605828k used, 15770176k free,    93100k buffers
> Swap:  2621432k total,     9760k used,  2611672k free,  9206880k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  5768 root      20   0 1808m 1.7g 519m R 100.0  3.6  32:24.92 bro
>  5760 root      20   0 1688m 1.6g 519m R 99.7  3.4  34:08.36 bro
>  3314 root      20   0 2160m 269m 4764 R 96.1  0.6  30:14.12 bro
>  5754 root      20   0 1451m 1.4g 519m R 82.8  2.9  36:40.02 bro
> TOP from B (included a few bro workers)
>
> top - 12:49:33 up 14:24,  2 users,  load average: 10.28, 9.31, 8.06
> Tasks: 708 total,  25 running, 683 sleeping,   0 stopped,   0 zombie
> Cpu(s): 41.6%us,  6.0%sy,  1.0%ni, 50.4%id,  0.0%wa,  0.0%hi,  1.1%si,
> 0.0%st
> Mem:  49376004k total, 31837340k used, 17538664k free,   147212k buffers
> Swap:  2621432k total,        0k used,  2621432k free, 13494332k cached
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  3178 root      20   0 1073m 1.0g 264m R 100.0  2.1 401:47.31 bro
>  3188 root      20   0  881m 832m 264m R 100.0  1.7 377:48.90 bro
>  3189 root      20   0 1247m 1.2g 264m R 100.0  2.5 403:22.95 bro
>  3193 root      20   0  920m 871m 264m R 100.0  1.8 429:45.98 bro
> Both have the same amount of Bro workers. I just do not understand
> why Server A is literally half the utilization on top of seeing more
> traffic? The only real and consistent difference between the two I see is
> that server A seems to have twice the amount of SHR (shared memory)
> compared to server B.
>
> Could this be part of the issue, if not the root cause? How might I go
> about rectifying the issue?
>
> FWIW, both are not dropping packets and doing well. However, I want to run
> other apps on top of this, and the poor performance on Server B is likely
> to have effects on it.
>
> Thanks advance for the advice!
>
> -Jason
>
>
>
>
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20140613/6e40f26b/attachment.html 


More information about the Bro mailing list