[Bro] building a new bro server
scampbell at lbl.gov
Tue Dec 9 17:17:12 PST 2014
-----BEGIN PGP SIGNED MESSAGE-----
I second it, and will be able to provide some docs/tuning/configs
regarding what we are currently doing.
On 12/9/14 7:46 PM, Harry Hoffman wrote:
> So, slightly off-topic but since myself and several of you seem to
> be going through this would anyone be willing to collaborate on a
> paper/presentation to submit for Brocon 2015 that details the
> various methodologies folks are using to capture at X rate?
> We're somewhere around 5-6Gbps average but burst as high as 9.
> I've been through many iterations to get the "perfect" recipe and
> it might prove useful to others.
> However there are many different options on the network and system
> side so there are probably a few "perfect" recipes depending upon
> budget and equipment.
> Cheers, Harry
> On Dec 9, 2014 5:55 PM, Alex Waher <alexwis at gmail.com> wrote:
>> Bear in mind that there is a 32 application limit for the number
>> of bro workers/slaves that can attach to a single cluster ID with
>> the pf_ring dna/zc drivers. Or you can get really crafty and
>> bounce traffic from one ring to another interface/ring and have
>> up to 64 workers on a single box, provided you have the cores to
>> work with :)
>> Looking at the current Intel chips, I'd say the 8-core high-clock
>> (+3.3Ghz) speed procs are a good option in a quad-socket system
>> build and not break the bank. Would give you 32-cores to pin
>> workers upon at a nice high clockspeed, which bro seems to
>> greatly appreciate. The E5-2687W v2 or E5-2667 v2 or E5-4627 v2,
>> some of which can turbo up to 4Ghz for traffic spikes (if you
>> manage the power modes correctly!
>> On Tue, Dec 9, 2014 at 1:25 PM, Gary Faulkner
>> <gfaulkner.nsm at gmail.com> wrote:
>>> For perspective I currently have a bro cluster comprised of 3
>>> physical hosts. The first host runs the manager, proxies, and
>>> has storage to handle lots of bro logs and keep them for
>>> several months, the other two are dedicated to workers with
>>> relatively little storage. We have a hardware load-balancer to
>>> distribute traffic as evenly as possible between the worker
>>> nodes, and some effort has been made to limit having to process
>>> really large uninteresting flows before they reach the cluster.
>>> I looked at one of our typically busier blocks of time today
>>> (10:00-14:00) and during that time the cluster was seeing an
>>> average of 10Gbps of traffic with peaks as high as 15Gbps.
>>> Looking at our traffic graphs and capstats showed each host
>>> typically was seeing around 50% of that load, or around 5Gbps
>>> on average. During this time we saw an average capture loss of
>>> around 0.47%, with a max loss of 22.53%. During that same
>>> time-frame I had 18 snapshots where individual workers reported
>>> loss over 5%, and 2 over 10% out of 748. So, I'd say each host
>>> is probably seeing about the same amount of traffic as you
>>> have described, but loaded scripts etc may vary from your
>>> configuration. We have 22 workers per host for a total of 44
>>> workers, and I believe the capture loss script is sampling
>>> traffic over 15 minute intervals by default, so there are
>>> roughly 17 time slices for each worker. Here are some details
>>> of how those nodes are configured in terms of hardware and
>>> 2 worker hosts each with: 2xE5-2697v2 (12 Cores / 24 HT)
>>> 2.7Ghz/3.5Ghz Turbo 256GB RAM (probably overkill, but I used to
>>> have the manager and proxies running on one of the hosts and it
>>> skewed my memory use quite a bit) Intel X520-DA2 NIC Bro 2.3-7
>>> (git master at the time I last updated) 22 workers PF_RING
>>> 5.6.2 using DNA IXGBE drivers, and pfdnacluster_master script
>>> CPU's pinned (used OS to verify which core presented to the OS
>>> mapped to each physical core to avoid mapping 2 workers to the
>>> same physical cores, and didn't use the 1st core on each CPU)
>>> HT is not disabled on these hosts and I'm still using the OS
>>> Worker configs like this: [worker-1] type=worker
>>> host=10.10.10.10 interface=dnacluster:21 lb_procs=22
I suspect the faster CPUs will handle bursty flows better such as when a
>>> large volume of traffic load balances to a single worker, while
>>> more cores will probably help when you can better distribute
>>> the workload more evenly. This led me to try to pick something
>>> that balanced the 2 options (more cores vs higher clock speed.
>>> Naturally YMMV, and your traffic may not look like mine.
>>> Hope this helps.
>>> Regards, Gary
>>> On 12/9/2014 12:00 PM, Seth Hall wrote:
>>>>> On Dec 8, 2014, at 10:57 PM, Allen, Brian
>>>>> <BrianAllen at wustl.edu> wrote:
>>>>> We saw a huge improvement when we went from 16Gig RAM to
>>>>> 128Gig RAM. (That one was pretty obvious so we did that
>>>>> first). We also saw improvement when we pinned the
>>>>> processes to the cores.
>>>> I think I had also suggested that you move to tcmalloc. Have
>>>> you tried that yet? It’s not going to fix your issue with
>>>> 30% packet loss, but I expect it would cut it down a bit
>>>> -- Seth Hall International Computer Science Institute (Bro)
>>>> because everyone has a network http://www.bro.org/
>>> _______________________________________________ Bro mailing
>>> list bro at bro-ids.org
> _______________________________________________ Bro mailing list
> bro at bro-ids.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG/MacGPG2 v2.0.22 (Darwin)
Comment: GPGTools - http://gpgtools.org
-----END PGP SIGNATURE-----
More information about the Bro