[Bro] building a new bro server

Allen, Brian brianallen at wustl.edu
Mon Dec 8 19:57:45 PST 2014

Good questions and suggestions.

  1.  The manager and the workers are all on the same server.
  2.  We have looked at all of those metrics, but the bro capture loss file is what we use most. That is the one saying we have 30+% packet loss.
  3.  We got a license and tried PF_Ring with DNA/Zero copy but it didn't make a noticeable difference.
  4.  We do use the node.cfg file to pin the 14 worker processes to the individual cores.  That leaves 2 free cores for OS/System tasks.

We saw a huge improvement when we went from 16Gig RAM to 128Gig RAM. (That one was pretty obvious so we did that first).  We also saw improvement when we pinned the processes to the cores.


From: Gary Faulkner <gfaulkner.nsm at gmail.com<mailto:gfaulkner.nsm at gmail.com>>
Date: Monday, December 8, 2014 at 6:52 PM
To: Brian Allen <brianallen at wustl.edu<mailto:brianallen at wustl.edu>>
Cc: Bro-Mailinglist <bro at bro.org<mailto:bro at bro.org>>
Subject: Re: [Bro] building a new bro server

A couple thoughts that might help the list better understand your topology/situation.

  1.  Are the manager and/or proxies on the same host?
  2.  What are you using to determine packet loss? (ex. Bro capture loss script, broctl netstat, pf_ring counters, etc)
  3.  Are you running PF_RING using any of the enhanced drivers (DNA/ZC) and/or zero copy scripts(Libzero/ZC)?
  4.  Are you pinning your worker processes to individual cores (via node.cfg) or are you letting the OS handle things?

I saw a marked improvement in average loss as measured by the bro capture loss script simply by pinning CPU cores on a server very similar to yours with similar traffic per host. Bursty traffic, and mega-flows, will still cause higher loss levels for individual workers at times though. Also, if you are running the manager and proxies on the same host they could be competing for the same cores that one or more workers are running on. Running htop might give you an idea of workers are being bounced between cores (if not pinned) as well as whether other processes are clobbering one or more of the cores your workers are on. Either could be an issue with workers running at 100% CPU usage.


On 12/8/2014 4:56 PM, Allen, Brian wrote:
Hi All-
I currently have a server running BRO, and we are seeing a lot of packet loss.  I am getting quotes for a new server to replace it, and I wanted to run some of the options by this group to see what would be better.

Current server specs:

-2 Processors, 8 cores each at 2.4GHz, so 16 total.  We run 14 bro processes, one per core.  And they run at 100% utilization all the time.
-128G memory
-Intel IXGBE 10Gig network card with pfring

We are seeing 3-4 Gig traffic pretty much constantly, and we spike to 5 Gig.  The bro packet-loss file shows 30+% packet loss most of the time, but during the early morning hours, when traffic drops considerably it will fall to 0.01%.

For one test, we used a bpf filter to block all traffic going to bro except for a one /24 subnet of campus traffic for about 15 minutes and the packet-loss dropped to 0.01%.

So we think our processors are too few and too slow to handle this amount of bandwidth.

Our question as we get a quote to buy a new box is, which is more important for BRO, having the roughly same number of cores but get faster ones, or get more cores at the same or slower speed?

I'm looking at the following two Dell server options, although I can adjust this to add other better possibilities:

-Intel Xeon E5-2699, two processors, 18 cores each at 2.3GHz for 36 total
-256Gig RAM
-Intel IXGBE 10Gig network card with pfring

-Intel Xeon E5-2687 two processors, 10 cores each at 3.1GHz for 20 total
-256Gig RAM
-Intel IXGBE 10Gig network card with pfring

I'm assuming the first option would be much better but I've never researched this to know for sure, or how much better it would actually be.  I think the difference in price is around $2,400.

I'd like to get one box to handle our bandwidth as it grows over the next couple years, take the current underpowered box and use it is a BRO test box/elastic search server, and build the infrastructure to move to a BRO cluster in a couple years.  Right now a single box would be better for space issues.

I would be really interested to talk to other companies/universities who are running bro in the 3-7 Gig bandwidth range right now so I can see what hardware works for you.

Thanks for your help,
Brian Allen, CISSP
Information Security Manager
Washington University
brianallen at wustl.edu<mailto:brianallen at wustl.edu>

Bro mailing list
bro at bro-ids.org<mailto:bro at bro-ids.org>http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20141209/76c14a97/attachment.html 

More information about the Bro mailing list