[Bro] Bro performance & sizing question

Mon Nov 16 14:42:07 PST 2015

> On Nov 13, 2015, at 2:50 PM, Melissa Muth <muthm at upenn.edu> wrote:
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> We have a Bro cluster currently attempting to process up to 13Gbps
> (1.4Mpps) partitioned over two 10Gbps Gigamon network taps.
> 
> Capture loss currently averages 44% - but before buying more hardware,
> we'd like to sanity-check our plans with folks who have already
> successfully sized their own installations.
> 
> Currently there are two Bro hosts in the cluster, each with 20 CPU
> cores (3.1Ghz), 128GB memory, and Myricom cards with the Sniffer V3
> driver. Each host runs a proxy, and 17 workers pinned to CPUs. The
> manager is running on one of the worker hosts, and logs are being
> written to SSD drives. We're using restrict_filters to ignore (large)
> flows generated by four hosts.
> 

Are those 20 real cores or 10 cores with hyperthreading?  We have some tests planned to further test this, but I think most people disable hypherthreading or don't pin workers to the 'extra' cores.  If you are running 17 workers on 10 real cores, that could lead to problems.

> The current plan is to buy 2 more worker hosts (same specs), as well
> as a NAS for storing logs after each hourly rotation.
> 
> If we're capturing 56% of 13Gbps, that's 7454Mbps. Given the 34 cores
> used by bro, that works out to 219Mbps/core and about 3.6Gbps/host.

That's not that an extreme amount of traffic, but 44% loss does sound a bit high.

What does broctl netstats report?  One thing to watch out for is that the myricom driver reports capture loss across the entire ring, so the dropped amount needs to be divided by the number of worker processes.

Step one should be to see if netstats reports a similar level of loss.  If netstats is reporting something closer to 1-5% loss, you could have a problem elsewhere.  If netstats agrees with capstats, then the workers are definitely not keeping up.

> Does that seem like expected performance, or might there be something
> broken somewhere? Does it seem reasonable to buy two more worker hosts
> (at least to handle current needs)?

Hard to say.. More boxes always helps, but it can't hurt to see if things can be optimized a bit with your current hardware.

If the gigamon you have is the kind that does aggregation/load balancing, you may be able to do something like send 50% as much traffic to each box to see how they would behave if you had 2 other boxes helping out.

-- 
- Justin Azoff