[Bro] Unexplained Performance Differences Between Like Servers

Sun Jun 15 23:54:07 PDT 2014

Hi Jason:

I believe one way to set a BPF filter is to modify site/local.bro to 
include:

redef cmd_line_bpf_filter = "ip or not ip";

I think there's also a packet filter framework 
(http://www.bro.org/sphinx/scripts/base/frameworks/packet-filter/main.html) 
which supports more elaborate filtering schemes, but I don't really know 
much about it offhand :)

Regarding the "other" traffic being the root cause of the issues: I 
think it's pretty difficult to say.  A few ideas:

* check the size of log files for significant differences.  if http.log 
/ reporter.log / weird.log / etc. is much longer on one system than on 
another, maybe that might be a place to start looking
* try setting a filter to only accept a certain type of traffic (e.g. 
HTTP, SSH) to see relative load for that specific traffic type
* try playing with which scripts bro loads (e.g. tweak local.bro and / 
or try running bro in bare mode with a very small set of loaded scripts) 
to see if that has any effect
* bro can be told to dump performance statistics into a human-readable 
ASCII log by including the "misc/profiling.bro" script: some of the 
information included there might be useful to have
* try capturing a trace and playing that trace back to a standalone bro 
process ... using tools like 'time' and 'perf' could help identify how 
performance changes based on the trace and scripts currently being loaded.
     } this has the benefit of not dropping packets while scripts are 
being tweaked...

As some food for thought: in general, bro does a few things every time 
there's a new packet:

* Retrieve the packet from the NIC
* Dissect the packet and generate events
* Spend time in script-land processing events that have been generated
* Spend time handling administrative overhead (e.g. check timers, check 
triggers)

Thus, in general, making bro go faster is probably going to mean making 
one of those things take less time.

Anyway, hope something in there is useful :)

Cheers,
Gilbert

On 6/13/14, 10:32 AM, Jason Batchelor wrote:
> FWIW:
> I just ran iptraf for a bit on both and one thing really stuck out to me:
> Server A:
> Other IP:      5273     633087        5273 633087           0          0
> Server B:
>  Other IP:    952797    445867K      952797 445867K           0          0
> So server A is seeing 633087 bytes of 'other' traffic, while B is 
> seeing  445867 kilobytes of 'other' traffic. Do you think this other 
> traffic could be the root cause of the issues here? If so, would a bpf 
> filter looking for only tcp/udp/ipv4 traffic be sufficient? How might 
> I apply that within Bro?
> Here is the full view taken some time after the metrics above:
> Server A:
> x               Total      Total    Incoming   Incoming Outgoing   
> Outgoing              x
> x             Packets      Bytes     Packets      Bytes Packets      
> Bytes              x
> x Total:     80187229     51270M    80187229 51270M           
> 0          0              x
> x IPv4:      80187193     50026M    80187193 50026M           
> 0          0              x
> x IPv6:            36       1296          36 1296           0          
> 0              x
> x TCP:       70040618     47342M    70040618 47342M           
> 0          0              x
> x UDP:       10052947      2676M    10052947 2676M           
> 0          0              x
> x ICMP:         85189    6652550       85189 6652550           
> 0          0              x
> x Other IP:      8475    1060993        8475 1060993           
> 0          0
> Server B:
> x               Total      Total    Incoming   Incoming Outgoing   
> Outgoing                   x
> x             Packets      Bytes     Packets      Bytes Packets      
> Bytes                   x
> x Total:     89718860     53317M    89718860 53317M           
> 0          0                   x
> x IPv4:      89712988     51882M    89712988 51882M           
> 0          0                   x
> x IPv6:          5872      51778        5872 51778           
> 0          0                   x
> x TCP:       79615124     49170M    79615124 49170M           
> 0          0                   x
> x UDP:        7627607      1682M     7627607 1682M           
> 0          0                   x
> x ICMP:         86620    5619078       86620 5619078           
> 0          0                   x
> x Other IP:   2389509      1023M     2389509 1023M           
> 0          0                   x
> Many thanks in advance for the quick and helpful replies!
>
>
> On Fri, Jun 13, 2014 at 9:19 AM, Jason Batchelor 
> <jxbatchelor at gmail.com <mailto:jxbatchelor at gmail.com>> wrote:
>
>     Wow, thanks for all the quick replies :)
>     > What versions of Bro, and it is the same for both?
>     I am using the same version of Bro for each server (1.2).
>     > Is the type of traffic in the 600 Mbps stream similar to the
>     type of traffic in the 700 Mbps stream?
>     I'm not 100% sure but I think that is a really good question to
>     ask. Do you know of any good tools that might help inform an
>     answer? I know of iptraf for example, is there one that folks
>     generally prefer the most?
>     > Are you only running 4 workers or did you truncate the output?
>     Yes, I truncated the output to show four workers each (I have 16
>     total).
>     > Are you doing 4 tuple load balancing or 2 tuple load balancing
>     between the two servers?
>     Sorry I am not sure what you mean by this or the implications of
>     one over the other. Is there an easy way I can find out (I am
>     kinda new to this)? I agree with the likelihood that B may be
>     recieving more flows.
>     Thanks!
>     Jason
>
>
>     On Fri, Jun 13, 2014 at 9:09 AM, Justin Azoff <JAzoff at albany.edu
>     <mailto:JAzoff at albany.edu>> wrote:
>
>         On Fri, Jun 13, 2014 at 08:01:54AM -0500, Jason Batchelor wrote:
>         > At the moment Server A is getting about 700MB/s and Server B
>         is getting about
>         > 600Mb/s.
>         >
>         > What I don't understand, is Server A is having several
>         orders of magnatude
>         > better performance compared to Server B?
>         >
>         > TOP from A (included a few bro workers):
>         >
>         > top - 12:48:45 up 1 day, 17:03,  2 users,  load average:
>         5.30, 3.99, 3.13
>         > Tasks: 706 total,  19 running, 687 sleeping,   0 stopped,  
>         0 zombie
>         > Cpu(s): 33.9%us,  6.6%sy,  1.1%ni, 57.2%id,  0.0%wa,
>          0.0%hi,  1.2%si,  0.0%st
>         > Mem:  49376004k total, 33605828k used, 15770176k free,  
>          93100k buffers
>         > Swap:  2621432k total,     9760k used,  2611672k free,
>          9206880k cached
>         >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>          COMMAND
>         >  5768 root      20   0 1808m 1.7g 519m R 100.0  3.6
>          32:24.92 bro
>         >  5760 root      20   0 1688m 1.6g 519m R 99.7  3.4  34:08.36 bro
>         >  3314 root      20   0 2160m 269m 4764 R 96.1  0.6  30:14.12 bro
>         >  5754 root      20   0 1451m 1.4g 519m R 82.8  2.9  36:40.02 bro
>
>         Server A Bro cpu utilization = 378.6
>
>         > TOP from B (included a few bro workers)
>         >
>         > top - 12:49:33 up 14:24,  2 users,  load average: 10.28,
>         9.31, 8.06
>         > Tasks: 708 total,  25 running, 683 sleeping,   0 stopped,  
>         0 zombie
>         > Cpu(s): 41.6%us,  6.0%sy,  1.0%ni, 50.4%id,  0.0%wa,
>          0.0%hi,  1.1%si,  0.0%st
>         > Mem:  49376004k total, 31837340k used, 17538664k free,  
>         147212k buffers
>         > Swap:  2621432k total,        0k used,  2621432k free,
>         13494332k cached
>         >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>          COMMAND
>         >  3178 root      20   0 1073m 1.0g 264m R 100.0  2.1
>         401:47.31 bro
>         >  3188 root      20   0  881m 832m 264m R 100.0  1.7
>         377:48.90 bro
>         >  3189 root      20   0 1247m 1.2g 264m R 100.0  2.5
>         403:22.95 bro
>         >  3193 root      20   0  920m 871m 264m R 100.0  1.8
>         429:45.98 bro
>
>         > Both have the same amount of Bro workers. I just do not
>         understand why Server
>         > A is literally half the utilization on top of seeing more
>         traffic? The only
>         > real and consistent difference between the two I see is that
>         server A seems to
>         > have twice the amount of SHR (shared memory) compared to
>         server B.
>
>         Server B Bro cpu utilization = 400%
>
>         Are you only running 4 workers or did you truncate the output?
>          Is that
>         running at 100% 24/7 or does it vary with the traffic?
>
>         Are you doing 4 tuple load balancing or 2 tuple load balancing
>         between
>         the two servers?  Most likely Server B is seeing more flows.
>
>
>         --
>         -- Justin Azoff
>
>
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20140616/7f24e2b7/attachment.html