[Bro] Sizing a Bro Cluster [was out of memory after a couple days]

Gary Faulkner gary at doit.wisc.edu
Fri Dec 6 16:38:35 PST 2013


I've had some proxy crashes in the past and it was suggested that I 
increase my number of proxies -- which I did until my environment 
appeared stable for about a week. After being stable for about a week I 
started to run out of memory, and in subsequent restarts have been 
running out of memory after about 24 hours of operation, typically 
during non-peak times (50% of normal traffic). Naturally I'm wondering 
if I'm just doing it wrong and if my set-up is appropriately sized and 
configured to handle the load I'm asking it to deal with.

I think I've seen folks on the list that were running Bro on similar 
hardware that might be able to tell me if my configuration is anything 
close to what works for them. I'm also curious how other folks determine 
how many proxies they need, how many workers per host etc.

I'm mostly running Bro 2.2 stock with default scripts, and only minor 
edits to local.bro to test out email notices. I'm only using these 
systems for Bro, although they were originally from another project so 
they weren't necessarily ordered with Bro specs in mind.

Here's how I've got things allocated currently:

Bro -1 Host:
2ea Xeon E5-2670 at 2.6Ghz (32 combined Logical Cores / 16 Physical)
64G RAM
manager
2 proxies
20 workers
2-4GB of Traffic

Bro-2 Host:
2ea Xeon E5-2670 at 2.6Ghz (32 combined Logical Cores / 16 Physical)
64G RAM
2 proxies
20 workers
2-4 GB of traffic

The following is a relatively light traffic load (late on a Friday) for 
my install (4Gbps vs 8Gbps):

bro-1 $ ./broctl capstats

Interface            kpps       mbps       (10s average)
------------------------------
192.168.0.10/dnacluster:21 338.6      2327.4
192.168.0.11/dnacluster:22 324.8      2264.7

Total                663.4      4592.1

bro-1 $ ./broctl top
Name       Type       Node       Pid      Proc     VSize Rss      
Cpu      Cmd
manager    manager    192.168.0.10 14816    parent     2G 736M     
88%      bro
manager    manager    192.168.0.10 14817    child    169M 93M     
44%      bro
proxy-1    proxy      192.168.0.10 14863    child    102M 26M     
23%      bro
proxy-1    proxy      192.168.0.10 14860    parent     1G 1G     
3%       bro
proxy-2    proxy      192.168.0.10 14862    child    102M 28M     
27%      bro
proxy-2    proxy      192.168.0.10 14861    parent     1G 1G     
3%       bro
proxy-3    proxy      192.168.0.11 28900    child    102M 46M     
20%      bro
proxy-3    proxy      192.168.0.11 28898    parent     1G 1G     
1%       bro
proxy-4    proxy      192.168.0.11 28899    child    102M 45M     
21%      bro
proxy-4    proxy      192.168.0.11 28897    parent     1G 1G     
1%       bro
worker-1-1 worker     192.168.0.10 15228    parent     2G 2G     
65%      bro
worker-1-1 worker     192.168.0.10 15398    child    514M 11M     
10%      bro
worker-1-10 worker     192.168.0.10 15230    parent     2G 2G     
53%      bro
worker-1-10 worker     192.168.0.10 15407    child    514M 12M     
8%       bro
worker-1-11 worker     192.168.0.10 15234    parent     2G 2G     
78%      bro
worker-1-11 worker     192.168.0.10 15286    child    514M 9M     
11%      bro
worker-1-12 worker     192.168.0.10 15235    parent     2G 2G     
67%      bro
worker-1-12 worker     192.168.0.10 15267    child    514M 8M     
12%      bro
worker-1-13 worker     192.168.0.10 15237    parent     2G 2G     
82%      bro
worker-1-13 worker     192.168.0.10 15392    child    514M 9M     
12%      bro
worker-1-14 worker     192.168.0.10 15238    parent     2G 2G     
43%      bro
worker-1-14 worker     192.168.0.10 15264    child    514M 11M     
8%       bro
worker-1-15 worker     192.168.0.10 15240    parent     2G 2G     
76%      bro
worker-1-15 worker     192.168.0.10 15300    child    514M 7M     
9%       bro
worker-1-16 worker     192.168.0.10 15243    parent     2G 2G     
94%      bro
worker-1-16 worker     192.168.0.10 15404    child    514M 11M     
9%       bro
worker-1-17 worker     192.168.0.10 15244    parent     2G 2G     
67%      bro
worker-1-17 worker     192.168.0.10 15383    child    514M 8M     
8%       bro
worker-1-18 worker     192.168.0.10 15246    parent     2G 2G     
80%      bro
worker-1-18 worker     192.168.0.10 15372    child    514M 12M     
11%      bro
worker-1-19 worker     192.168.0.10 15248    parent     2G 2G     
76%      bro
worker-1-19 worker     192.168.0.10 15376    child    514M 8M     
8%       bro
worker-1-2 worker     192.168.0.10 15251    parent     2G 2G     
83%      bro
worker-1-2 worker     192.168.0.10 15414    child    514M 11M     
10%      bro
worker-1-20 worker     192.168.0.10 15254    parent     2G 2G     
86%      bro
worker-1-20 worker     192.168.0.10 15417    child    514M 12M     
11%      bro
worker-1-3 worker     192.168.0.10 15253    parent     2G 2G     
55%      bro
worker-1-3 worker     192.168.0.10 15375    child    514M 8M     
12%      bro
worker-1-4 worker     192.168.0.10 15256    parent     2G 2G     
87%      bro
worker-1-4 worker     192.168.0.10 15388    child    515M 8M     
10%      bro
worker-1-5 worker     192.168.0.10 15257    parent     2G 2G     
58%      bro
worker-1-5 worker     192.168.0.10 15395    child    515M 11M     
10%      bro
worker-1-6 worker     192.168.0.10 15258    parent     2G 2G     
96%      bro
worker-1-6 worker     192.168.0.10 15394    child    514M 11M     
8%       bro
worker-1-7 worker     192.168.0.10 15259    parent     2G 2G     
65%      bro
worker-1-7 worker     192.168.0.10 15413    child    514M 12M     
6%       bro
worker-1-8 worker     192.168.0.10 15260    parent     2G 2G     
99%      bro
worker-1-8 worker     192.168.0.10 15401    child    514M 11M     
8%       bro
worker-1-9 worker     192.168.0.10 15261    parent     2G 2G     
61%      bro
worker-1-9 worker     192.168.0.10 15408    child    514M 11M     
8%       bro
worker-2-1 worker     192.168.0.11 29961    parent     2G 2G     
85%      bro
worker-2-1 worker     192.168.0.11 29984    child    514M 31M     
9%       bro
worker-2-10 worker     192.168.0.11 29959    parent     2G 2G     
52%      bro
worker-2-10 worker     192.168.0.11 30085    child    515M 31M     
8%       bro
worker-2-11 worker     192.168.0.11 29960    parent     2G 2G     
96%      bro
worker-2-11 worker     192.168.0.11 30112    child    514M 31M     
10%      bro
worker-2-12 worker     192.168.0.11 29973    parent     2G 2G     
54%      bro
worker-2-12 worker     192.168.0.11 30082    child    514M 30M     
8%       bro
worker-2-13 worker     192.168.0.11 29967    parent     2G 2G     
93%      bro
worker-2-13 worker     192.168.0.11 30111    child    514M 31M     
10%      bro
worker-2-14 worker     192.168.0.11 29962    parent     2G 2G     
100%     bro
worker-2-14 worker     192.168.0.11 30076    child    514M 30M     
8%       bro
worker-2-15 worker     192.168.0.11 29975    parent     2G 2G     
55%      bro
worker-2-15 worker     192.168.0.11 30138    child    514M 31M     
10%      bro
worker-2-16 worker     192.168.0.11 29965    parent     2G 2G     
85%      bro
worker-2-16 worker     192.168.0.11 29994    child    514M 31M     
8%       bro
worker-2-17 worker     192.168.0.11 29968    parent     2G 2G     
76%      bro
worker-2-17 worker     192.168.0.11 30097    child    514M 31M     
8%       bro
worker-2-18 worker     192.168.0.11 29972    parent     2G 2G     
95%      bro
worker-2-18 worker     192.168.0.11 30115    child    514M 30M     
10%      bro
worker-2-19 worker     192.168.0.11 29964    parent     2G 2G     
68%      bro
worker-2-19 worker     192.168.0.11 30092    child    514M 31M     
7%       bro
worker-2-2 worker     192.168.0.11 29974    parent     2G 2G     
51%      bro
worker-2-2 worker     192.168.0.11 30133    child    514M 31M     
7%       bro
worker-2-20 worker     192.168.0.11 29966    parent     2G 2G     
59%      bro
worker-2-20 worker     192.168.0.11 29981    child    514M 30M     
10%      bro
worker-2-3 worker     192.168.0.11 29969    parent     2G 2G     
95%      bro
worker-2-3 worker     192.168.0.11 30095    child    514M 31M     
8%       bro
worker-2-4 worker     192.168.0.11 29970    parent     2G 2G     
95%      bro
worker-2-4 worker     192.168.0.11 30137    child    514M 30M     
8%       bro
worker-2-5 worker     192.168.0.11 29977    parent     2G 2G     
84%      bro
worker-2-5 worker     192.168.0.11 30100    child    514M 31M     
10%      bro
worker-2-6 worker     192.168.0.11 29978    parent     2G 2G     
73%      bro
worker-2-6 worker     192.168.0.11 29990    child    514M 30M     
8%       bro
worker-2-7 worker     192.168.0.11 29976    parent     2G 2G     
76%      bro
worker-2-7 worker     192.168.0.11 30081    child    514M 31M     
10%      bro
worker-2-8 worker     192.168.0.11 29963    parent     2G 2G     
57%      bro
worker-2-8 worker     192.168.0.11 29987    child    514M 30M     
8%       bro
worker-2-9 worker     192.168.0.11 29971    parent     2G 2G     
52%      bro
worker-2-9 worker     192.168.0.11 30096    child    514M 31M     
10%      bro

bro-1 $ free -g
              total       used       free     shared    buffers cached
Mem:            62         62          0          0 0         17
-/+ buffers/cache:         44         17
Swap:            0          0          0

bro-2 $ free -g
              total       used       free     shared    buffers cached
Mem:            62         45         17          0 0          1
-/+ buffers/cache:         44         18
Swap:            0          0          0

What do you guys think?

Regards,
Gary

PS ~

I've been reading the mailing list archives and it seems that folks with 
the older Xeons with higher clock rates (3.4Ghzish), but fewer cores 
were able to handle upwards of 400-500Mbps per worker process. I've also 
seen it hinted, I think by Vlad G., that he was fitting in 28 workers on 
boxes with similar core counts to my own, but slightly faster 
processors. Based on some of those remarks in previous threads I was 
thinking I should be able to handle a little over 300Mbps per process 
with these processors, but I've only had the traffic to push about 
200Mbps per worker so far.

I know some folks also like to put the manager and possibly the proxies 
on separate boxes from the workers, but I haven't gotten a good sense as 
to what kind of workload a proxy can handle. As far as proxies I've 
mostly seen comments such as "I probably have way more proxies than I 
need" or "Just keep adding proxies until they stop crashing".  I don't 
currently have a spare box for the manager and proxy, but would be 
curious to know if folks feel it is a necessity. My observations on my 
own setup are that my Bro workers typically are using 99% of a logical 
core at peak network times, and my manager 150-175% (multi-threaded). My 
workers seem to use about 1-2G of memory normally.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6257 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20131206/8f36f9ca/attachment.bin 


More information about the Bro mailing list