[Bro] Unexplained Performance Differences Between Like Servers
Gilbert Clark
gc355804 at ohio.edu
Sun Jun 15 23:54:07 PDT 2014
Hi Jason:
I believe one way to set a BPF filter is to modify site/local.bro to
include:
redef cmd_line_bpf_filter = "ip or not ip";
I think there's also a packet filter framework
(http://www.bro.org/sphinx/scripts/base/frameworks/packet-filter/main.html)
which supports more elaborate filtering schemes, but I don't really know
much about it offhand :)
Regarding the "other" traffic being the root cause of the issues: I
think it's pretty difficult to say. A few ideas:
* check the size of log files for significant differences. if http.log
/ reporter.log / weird.log / etc. is much longer on one system than on
another, maybe that might be a place to start looking
* try setting a filter to only accept a certain type of traffic (e.g.
HTTP, SSH) to see relative load for that specific traffic type
* try playing with which scripts bro loads (e.g. tweak local.bro and /
or try running bro in bare mode with a very small set of loaded scripts)
to see if that has any effect
* bro can be told to dump performance statistics into a human-readable
ASCII log by including the "misc/profiling.bro" script: some of the
information included there might be useful to have
* try capturing a trace and playing that trace back to a standalone bro
process ... using tools like 'time' and 'perf' could help identify how
performance changes based on the trace and scripts currently being loaded.
} this has the benefit of not dropping packets while scripts are
being tweaked...
As some food for thought: in general, bro does a few things every time
there's a new packet:
* Retrieve the packet from the NIC
* Dissect the packet and generate events
* Spend time in script-land processing events that have been generated
* Spend time handling administrative overhead (e.g. check timers, check
triggers)
Thus, in general, making bro go faster is probably going to mean making
one of those things take less time.
Anyway, hope something in there is useful :)
Cheers,
Gilbert
On 6/13/14, 10:32 AM, Jason Batchelor wrote:
> FWIW:
> I just ran iptraf for a bit on both and one thing really stuck out to me:
> Server A:
> Other IP: 5273 633087 5273 633087 0 0
> Server B:
> Other IP: 952797 445867K 952797 445867K 0 0
> So server A is seeing 633087 bytes of 'other' traffic, while B is
> seeing 445867 kilobytes of 'other' traffic. Do you think this other
> traffic could be the root cause of the issues here? If so, would a bpf
> filter looking for only tcp/udp/ipv4 traffic be sufficient? How might
> I apply that within Bro?
> Here is the full view taken some time after the metrics above:
> Server A:
> x Total Total Incoming Incoming Outgoing
> Outgoing x
> x Packets Bytes Packets Bytes Packets
> Bytes x
> x Total: 80187229 51270M 80187229 51270M
> 0 0 x
> x IPv4: 80187193 50026M 80187193 50026M
> 0 0 x
> x IPv6: 36 1296 36 1296 0
> 0 x
> x TCP: 70040618 47342M 70040618 47342M
> 0 0 x
> x UDP: 10052947 2676M 10052947 2676M
> 0 0 x
> x ICMP: 85189 6652550 85189 6652550
> 0 0 x
> x Other IP: 8475 1060993 8475 1060993
> 0 0
> Server B:
> x Total Total Incoming Incoming Outgoing
> Outgoing x
> x Packets Bytes Packets Bytes Packets
> Bytes x
> x Total: 89718860 53317M 89718860 53317M
> 0 0 x
> x IPv4: 89712988 51882M 89712988 51882M
> 0 0 x
> x IPv6: 5872 51778 5872 51778
> 0 0 x
> x TCP: 79615124 49170M 79615124 49170M
> 0 0 x
> x UDP: 7627607 1682M 7627607 1682M
> 0 0 x
> x ICMP: 86620 5619078 86620 5619078
> 0 0 x
> x Other IP: 2389509 1023M 2389509 1023M
> 0 0 x
> Many thanks in advance for the quick and helpful replies!
>
>
> On Fri, Jun 13, 2014 at 9:19 AM, Jason Batchelor
> <jxbatchelor at gmail.com <mailto:jxbatchelor at gmail.com>> wrote:
>
> Wow, thanks for all the quick replies :)
> > What versions of Bro, and it is the same for both?
> I am using the same version of Bro for each server (1.2).
> > Is the type of traffic in the 600 Mbps stream similar to the
> type of traffic in the 700 Mbps stream?
> I'm not 100% sure but I think that is a really good question to
> ask. Do you know of any good tools that might help inform an
> answer? I know of iptraf for example, is there one that folks
> generally prefer the most?
> > Are you only running 4 workers or did you truncate the output?
> Yes, I truncated the output to show four workers each (I have 16
> total).
> > Are you doing 4 tuple load balancing or 2 tuple load balancing
> between the two servers?
> Sorry I am not sure what you mean by this or the implications of
> one over the other. Is there an easy way I can find out (I am
> kinda new to this)? I agree with the likelihood that B may be
> recieving more flows.
> Thanks!
> Jason
>
>
> On Fri, Jun 13, 2014 at 9:09 AM, Justin Azoff <JAzoff at albany.edu
> <mailto:JAzoff at albany.edu>> wrote:
>
> On Fri, Jun 13, 2014 at 08:01:54AM -0500, Jason Batchelor wrote:
> > At the moment Server A is getting about 700MB/s and Server B
> is getting about
> > 600Mb/s.
> >
> > What I don't understand, is Server A is having several
> orders of magnatude
> > better performance compared to Server B?
> >
> > TOP from A (included a few bro workers):
> >
> > top - 12:48:45 up 1 day, 17:03, 2 users, load average:
> 5.30, 3.99, 3.13
> > Tasks: 706 total, 19 running, 687 sleeping, 0 stopped,
> 0 zombie
> > Cpu(s): 33.9%us, 6.6%sy, 1.1%ni, 57.2%id, 0.0%wa,
> 0.0%hi, 1.2%si, 0.0%st
> > Mem: 49376004k total, 33605828k used, 15770176k free,
> 93100k buffers
> > Swap: 2621432k total, 9760k used, 2611672k free,
> 9206880k cached
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> > 5768 root 20 0 1808m 1.7g 519m R 100.0 3.6
> 32:24.92 bro
> > 5760 root 20 0 1688m 1.6g 519m R 99.7 3.4 34:08.36 bro
> > 3314 root 20 0 2160m 269m 4764 R 96.1 0.6 30:14.12 bro
> > 5754 root 20 0 1451m 1.4g 519m R 82.8 2.9 36:40.02 bro
>
> Server A Bro cpu utilization = 378.6
>
> > TOP from B (included a few bro workers)
> >
> > top - 12:49:33 up 14:24, 2 users, load average: 10.28,
> 9.31, 8.06
> > Tasks: 708 total, 25 running, 683 sleeping, 0 stopped,
> 0 zombie
> > Cpu(s): 41.6%us, 6.0%sy, 1.0%ni, 50.4%id, 0.0%wa,
> 0.0%hi, 1.1%si, 0.0%st
> > Mem: 49376004k total, 31837340k used, 17538664k free,
> 147212k buffers
> > Swap: 2621432k total, 0k used, 2621432k free,
> 13494332k cached
> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> > 3178 root 20 0 1073m 1.0g 264m R 100.0 2.1
> 401:47.31 bro
> > 3188 root 20 0 881m 832m 264m R 100.0 1.7
> 377:48.90 bro
> > 3189 root 20 0 1247m 1.2g 264m R 100.0 2.5
> 403:22.95 bro
> > 3193 root 20 0 920m 871m 264m R 100.0 1.8
> 429:45.98 bro
>
> > Both have the same amount of Bro workers. I just do not
> understand why Server
> > A is literally half the utilization on top of seeing more
> traffic? The only
> > real and consistent difference between the two I see is that
> server A seems to
> > have twice the amount of SHR (shared memory) compared to
> server B.
>
> Server B Bro cpu utilization = 400%
>
> Are you only running 4 workers or did you truncate the output?
> Is that
> running at 100% 24/7 or does it vary with the traffic?
>
> Are you doing 4 tuple load balancing or 2 tuple load balancing
> between
> the two servers? Most likely Server B is seeing more flows.
>
>
> --
> -- Justin Azoff
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20140616/7f24e2b7/attachment.html
More information about the Bro
mailing list