[Bro] Bro performance & sizing question

Wed Nov 18 13:25:07 PST 2015

Hi Melissa,
I built our cluster with the following specs:

Two bro hosts in the cluster + a separate server which runs the bro manager and stores the logs + the arista switch.

Each worker server has 20 cores, 128GB memory, with a 10Gig Intel network card with Pf_Ring.  We have now pinned 18 bro processes to the cores, which leaves 2 for OS tasks.

Regarding campus bandwidth, we are pushing 4Gb during normal peak times on I1.  We are pushing up to 2Gb on I2 at random times.  So it is very possible to be pushing between 4 and 6Gb of traffic at any one time.

I installed the cluster in a datacenter nearby to test it out before moving it downtown next to our edge routers.  The edge routers span the traffic to our bro cluster arista switch.  While the cluster was in the the datacenter it saw live traffic, but it was much less than the overall campus bandwidth, but I thought it would be a good place to test it out near my office and make sure it was working correctly.  The packet loss in the data center was very low, the cluster was running great, logs looked complete, so we moved it downtown.

We immediately had a problem with capture loss of 60%-90%.  The CPUs, though, weren't all pegged at 100% like they were with our old underpowered bro box.  Only 6-7 processors were pegged at 100% and the rest were down around 30-40%.  We added a bpf filter to only process packets to/from our campus subnets.  We haven't tried a filter for large flows yet.  We double checked pf_ring settings, double checked the arista switch settings, and tuned the network card, but the main thing we did was turn off hyper threading.  That immediately dropped the capture loss down to 0.0%  I guess we didn't catch that earlier because the data center didn't have as much traffic going through the cluster.  In the datacenter test we had pinned bro to 36 of the 40 hyper threaded cpus on each worker, so when we got rid of that, it worked great.  We had been running eight bro proxies on the manager when it was hyper threaded, and I think I can drop that down to four proxies now, I just haven't tried that yet since it is working at the moment.

After we got the packet loss to 0.0%, we actually ran a test where we set up a separate instance of BRO on just one of our worker servers and it was able to handle the entire load (at least during a short test).  All the processors jumped from 20-40% usage to 40-60% when we ran it on a single box, but the capture loss was still basically zero. We are going to run with two boxes, though, because we expect our bandwidth needs to grow and the cluster will be able to keep up for a while.

Hope that helps.

Brian Allen
Information Security Manager
Washington University

From: <bro-bounces at bro.org<mailto:bro-bounces at bro.org>> on behalf of Melissa Muth <muthm at upenn.edu<mailto:muthm at upenn.edu>>
Date: Friday, November 13, 2015 at 1:50 PM
To: Bro-Mailinglist <bro at bro.org<mailto:bro at bro.org>>
Subject: [Bro] Bro performance & sizing question

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

We have a Bro cluster currently attempting to process up to 13Gbps
(1.4Mpps) partitioned over two 10Gbps Gigamon network taps.

Capture loss currently averages 44% - but before buying more hardware,
we'd like to sanity-check our plans with folks who have already
successfully sized their own installations.

Currently there are two Bro hosts in the cluster, each with 20 CPU
cores (3.1Ghz), 128GB memory, and Myricom cards with the Sniffer V3
driver. Each host runs a proxy, and 17 workers pinned to CPUs. The
manager is running on one of the worker hosts, and logs are being
written to SSD drives. We're using restrict_filters to ignore (large)
flows generated by four hosts.

The current plan is to buy 2 more worker hosts (same specs), as well
as a NAS for storing logs after each hourly rotation.

If we're capturing 56% of 13Gbps, that's 7454Mbps. Given the 34 cores
used by bro, that works out to 219Mbps/core and about 3.6Gbps/host.

Does that seem like expected performance, or might there be something
broken somewhere? Does it seem reasonable to buy two more worker hosts
(at least to handle current needs)?

Any thoughts or recommendations would be much appreciated.

Cheers,
Melissa
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iEYEARECAAYFAlZGPvwACgkQjGIGZe3KNcl6GgCgijm+F4zbDC0rnuP8VMRa2YSi
Tz8AoIPAvHBeF/R1e/C+HEIkSv2XO//L
=p+4P
-----END PGP SIGNATURE-----
_______________________________________________
Bro mailing list
bro at bro-ids.org<mailto:bro at bro-ids.org>
http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20151118/4ebc7ba1/attachment.html