[Bro] building a new bro server

Harry Hoffman hhoffman at ip-solutions.net
Tue Dec 9 16:46:39 PST 2014


So, slightly off-topic but since myself and several of you seem to be going through this would anyone be willing to collaborate on a paper/presentation to submit for Brocon 2015 that details the various methodologies folks are using to capture at X rate?

We're somewhere around 5-6Gbps average but burst as high as 9.

I've been through many iterations to get the "perfect" recipe and it might prove useful to others.

However there are many different options on the network and system side so there are probably a few "perfect" recipes depending upon budget and equipment.

Thoughts?

Cheers,
Harry

On Dec 9, 2014 5:55 PM, Alex Waher <alexwis at gmail.com> wrote:
>
> Bear in mind that there is a 32 application limit for the number of bro workers/slaves that can attach to a single cluster ID with the pf_ring dna/zc drivers. Or you can get really crafty and bounce traffic from one ring to another interface/ring and have up to 64 workers on a single box, provided you have the cores to work with :)
>
> Looking at the current Intel chips, I'd say the 8-core high-clock (+3.3Ghz) speed procs are a good option in a quad-socket system build and not break the bank. Would give you 32-cores to pin workers upon at a nice high clockspeed, which bro seems to greatly appreciate.  The E5-2687W v2 or E5-2667 v2 or E5-4627 v2, some of which can turbo up to 4Ghz for traffic spikes (if you manage the power modes correctly! https://communities.intel.com/community/itpeernetwork/datastack/blog/2013/08/05/how-to-maximise-cpu-performance-for-the-oracle-database-on-linux )
>
> -Alex
>
> On Tue, Dec 9, 2014 at 1:25 PM, Gary Faulkner <gfaulkner.nsm at gmail.com> wrote:
>>
>> For perspective I currently have a bro cluster comprised of 3 physical
>> hosts. The first host runs the manager, proxies, and has storage to
>> handle lots of bro logs and keep them for several months, the other two
>> are dedicated to workers with relatively little storage. We have a
>> hardware load-balancer to distribute traffic as evenly as possible
>> between the worker nodes, and some effort has been made to limit having
>> to process really large uninteresting flows before they reach the
>> cluster. I looked at one of our typically busier blocks of time today
>> (10:00-14:00) and during that time the cluster was seeing an average of
>> 10Gbps of traffic with peaks as high as 15Gbps. Looking at our traffic
>> graphs and capstats showed each host typically was seeing around 50% of
>> that load, or around 5Gbps on average. During this time we saw an
>> average capture loss of around 0.47%, with a max loss of 22.53%. During
>> that same time-frame I had 18 snapshots where individual workers
>> reported loss over 5%, and 2 over 10% out of 748. So, I'd say each host
>> is probably seeing about the same amount of traffic as you have
>> described, but loaded scripts etc may vary from your configuration. We
>> have 22 workers per host for a total of 44 workers, and I believe the
>> capture loss script is sampling traffic over 15 minute intervals by
>> default, so there are roughly 17 time slices for each worker. Here are
>> some details of how those nodes are configured in terms of hardware and bro.
>>
>> 2 worker hosts each with:
>> 2xE5-2697v2 (12 Cores / 24 HT) 2.7Ghz/3.5Ghz Turbo
>> 256GB RAM (probably overkill, but I used to have the manager and proxies
>> running on one of the hosts and it skewed my memory use quite a bit)
>> Intel X520-DA2 NIC
>> Bro 2.3-7 (git master at the time I last updated)
>> 22 workers
>> PF_RING 5.6.2 using DNA IXGBE drivers, and pfdnacluster_master script
>> CPU's pinned (used OS to verify which core presented to the OS mapped to
>> each physical core to avoid mapping 2 workers to the same physical
>> cores, and didn't use the 1st core on each CPU)
>> HT is not disabled on these hosts and I'm still using the OS malloc.
>>
>> Worker configs like this:
>> [worker-1]
>> type=worker
>> host=10.10.10.10
>> interface=dnacluster:21
>> lb_procs=22
>> lb_method=pf_ring
>> pin_cpus=2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23
>>
>> I suspect the faster CPUs will handle bursty flows better such as when a
>> large volume of traffic load balances to a single worker, while more
>> cores will probably help when you can better distribute the workload
>> more evenly. This led me to try to pick something that balanced the 2
>> options (more cores vs higher clock speed. Naturally YMMV, and your
>> traffic may not look like mine.
>>
>> Hope this helps.
>>
>> Regards,
>> Gary
>>
>> On 12/9/2014 12:00 PM, Seth Hall wrote:
>> >> On Dec 8, 2014, at 10:57 PM, Allen, Brian <BrianAllen at wustl.edu> wrote:
>> >>
>> >> We saw a huge improvement when we went from 16Gig RAM to 128Gig RAM. (That one was pretty obvious so we did that first).  We also saw improvement when we pinned the processes to the cores.
>> > I think I had also suggested that you move to tcmalloc.  Have you tried that yet?  It’s not going to fix your issue with 30% packet loss, but I expect it would cut it down a bit further.
>> >
>> >    .Seth
>> >
>> > --
>> > Seth Hall
>> > International Computer Science Institute
>> > (Bro) because everyone has a network
>> > http://www.bro.org/
>> >
>>
>> _______________________________________________
>> Bro mailing list
>> bro at bro-ids.org
>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>
>




More information about the Bro mailing list