[Bro] Sizing a Bro Cluster [was out of memory after a couple days]
Gary Faulkner
gary at doit.wisc.edu
Fri Dec 6 16:38:35 PST 2013
I've had some proxy crashes in the past and it was suggested that I
increase my number of proxies -- which I did until my environment
appeared stable for about a week. After being stable for about a week I
started to run out of memory, and in subsequent restarts have been
running out of memory after about 24 hours of operation, typically
during non-peak times (50% of normal traffic). Naturally I'm wondering
if I'm just doing it wrong and if my set-up is appropriately sized and
configured to handle the load I'm asking it to deal with.
I think I've seen folks on the list that were running Bro on similar
hardware that might be able to tell me if my configuration is anything
close to what works for them. I'm also curious how other folks determine
how many proxies they need, how many workers per host etc.
I'm mostly running Bro 2.2 stock with default scripts, and only minor
edits to local.bro to test out email notices. I'm only using these
systems for Bro, although they were originally from another project so
they weren't necessarily ordered with Bro specs in mind.
Here's how I've got things allocated currently:
Bro -1 Host:
2ea Xeon E5-2670 at 2.6Ghz (32 combined Logical Cores / 16 Physical)
64G RAM
manager
2 proxies
20 workers
2-4GB of Traffic
Bro-2 Host:
2ea Xeon E5-2670 at 2.6Ghz (32 combined Logical Cores / 16 Physical)
64G RAM
2 proxies
20 workers
2-4 GB of traffic
The following is a relatively light traffic load (late on a Friday) for
my install (4Gbps vs 8Gbps):
bro-1 $ ./broctl capstats
Interface kpps mbps (10s average)
------------------------------
192.168.0.10/dnacluster:21 338.6 2327.4
192.168.0.11/dnacluster:22 324.8 2264.7
Total 663.4 4592.1
bro-1 $ ./broctl top
Name Type Node Pid Proc VSize Rss
Cpu Cmd
manager manager 192.168.0.10 14816 parent 2G 736M
88% bro
manager manager 192.168.0.10 14817 child 169M 93M
44% bro
proxy-1 proxy 192.168.0.10 14863 child 102M 26M
23% bro
proxy-1 proxy 192.168.0.10 14860 parent 1G 1G
3% bro
proxy-2 proxy 192.168.0.10 14862 child 102M 28M
27% bro
proxy-2 proxy 192.168.0.10 14861 parent 1G 1G
3% bro
proxy-3 proxy 192.168.0.11 28900 child 102M 46M
20% bro
proxy-3 proxy 192.168.0.11 28898 parent 1G 1G
1% bro
proxy-4 proxy 192.168.0.11 28899 child 102M 45M
21% bro
proxy-4 proxy 192.168.0.11 28897 parent 1G 1G
1% bro
worker-1-1 worker 192.168.0.10 15228 parent 2G 2G
65% bro
worker-1-1 worker 192.168.0.10 15398 child 514M 11M
10% bro
worker-1-10 worker 192.168.0.10 15230 parent 2G 2G
53% bro
worker-1-10 worker 192.168.0.10 15407 child 514M 12M
8% bro
worker-1-11 worker 192.168.0.10 15234 parent 2G 2G
78% bro
worker-1-11 worker 192.168.0.10 15286 child 514M 9M
11% bro
worker-1-12 worker 192.168.0.10 15235 parent 2G 2G
67% bro
worker-1-12 worker 192.168.0.10 15267 child 514M 8M
12% bro
worker-1-13 worker 192.168.0.10 15237 parent 2G 2G
82% bro
worker-1-13 worker 192.168.0.10 15392 child 514M 9M
12% bro
worker-1-14 worker 192.168.0.10 15238 parent 2G 2G
43% bro
worker-1-14 worker 192.168.0.10 15264 child 514M 11M
8% bro
worker-1-15 worker 192.168.0.10 15240 parent 2G 2G
76% bro
worker-1-15 worker 192.168.0.10 15300 child 514M 7M
9% bro
worker-1-16 worker 192.168.0.10 15243 parent 2G 2G
94% bro
worker-1-16 worker 192.168.0.10 15404 child 514M 11M
9% bro
worker-1-17 worker 192.168.0.10 15244 parent 2G 2G
67% bro
worker-1-17 worker 192.168.0.10 15383 child 514M 8M
8% bro
worker-1-18 worker 192.168.0.10 15246 parent 2G 2G
80% bro
worker-1-18 worker 192.168.0.10 15372 child 514M 12M
11% bro
worker-1-19 worker 192.168.0.10 15248 parent 2G 2G
76% bro
worker-1-19 worker 192.168.0.10 15376 child 514M 8M
8% bro
worker-1-2 worker 192.168.0.10 15251 parent 2G 2G
83% bro
worker-1-2 worker 192.168.0.10 15414 child 514M 11M
10% bro
worker-1-20 worker 192.168.0.10 15254 parent 2G 2G
86% bro
worker-1-20 worker 192.168.0.10 15417 child 514M 12M
11% bro
worker-1-3 worker 192.168.0.10 15253 parent 2G 2G
55% bro
worker-1-3 worker 192.168.0.10 15375 child 514M 8M
12% bro
worker-1-4 worker 192.168.0.10 15256 parent 2G 2G
87% bro
worker-1-4 worker 192.168.0.10 15388 child 515M 8M
10% bro
worker-1-5 worker 192.168.0.10 15257 parent 2G 2G
58% bro
worker-1-5 worker 192.168.0.10 15395 child 515M 11M
10% bro
worker-1-6 worker 192.168.0.10 15258 parent 2G 2G
96% bro
worker-1-6 worker 192.168.0.10 15394 child 514M 11M
8% bro
worker-1-7 worker 192.168.0.10 15259 parent 2G 2G
65% bro
worker-1-7 worker 192.168.0.10 15413 child 514M 12M
6% bro
worker-1-8 worker 192.168.0.10 15260 parent 2G 2G
99% bro
worker-1-8 worker 192.168.0.10 15401 child 514M 11M
8% bro
worker-1-9 worker 192.168.0.10 15261 parent 2G 2G
61% bro
worker-1-9 worker 192.168.0.10 15408 child 514M 11M
8% bro
worker-2-1 worker 192.168.0.11 29961 parent 2G 2G
85% bro
worker-2-1 worker 192.168.0.11 29984 child 514M 31M
9% bro
worker-2-10 worker 192.168.0.11 29959 parent 2G 2G
52% bro
worker-2-10 worker 192.168.0.11 30085 child 515M 31M
8% bro
worker-2-11 worker 192.168.0.11 29960 parent 2G 2G
96% bro
worker-2-11 worker 192.168.0.11 30112 child 514M 31M
10% bro
worker-2-12 worker 192.168.0.11 29973 parent 2G 2G
54% bro
worker-2-12 worker 192.168.0.11 30082 child 514M 30M
8% bro
worker-2-13 worker 192.168.0.11 29967 parent 2G 2G
93% bro
worker-2-13 worker 192.168.0.11 30111 child 514M 31M
10% bro
worker-2-14 worker 192.168.0.11 29962 parent 2G 2G
100% bro
worker-2-14 worker 192.168.0.11 30076 child 514M 30M
8% bro
worker-2-15 worker 192.168.0.11 29975 parent 2G 2G
55% bro
worker-2-15 worker 192.168.0.11 30138 child 514M 31M
10% bro
worker-2-16 worker 192.168.0.11 29965 parent 2G 2G
85% bro
worker-2-16 worker 192.168.0.11 29994 child 514M 31M
8% bro
worker-2-17 worker 192.168.0.11 29968 parent 2G 2G
76% bro
worker-2-17 worker 192.168.0.11 30097 child 514M 31M
8% bro
worker-2-18 worker 192.168.0.11 29972 parent 2G 2G
95% bro
worker-2-18 worker 192.168.0.11 30115 child 514M 30M
10% bro
worker-2-19 worker 192.168.0.11 29964 parent 2G 2G
68% bro
worker-2-19 worker 192.168.0.11 30092 child 514M 31M
7% bro
worker-2-2 worker 192.168.0.11 29974 parent 2G 2G
51% bro
worker-2-2 worker 192.168.0.11 30133 child 514M 31M
7% bro
worker-2-20 worker 192.168.0.11 29966 parent 2G 2G
59% bro
worker-2-20 worker 192.168.0.11 29981 child 514M 30M
10% bro
worker-2-3 worker 192.168.0.11 29969 parent 2G 2G
95% bro
worker-2-3 worker 192.168.0.11 30095 child 514M 31M
8% bro
worker-2-4 worker 192.168.0.11 29970 parent 2G 2G
95% bro
worker-2-4 worker 192.168.0.11 30137 child 514M 30M
8% bro
worker-2-5 worker 192.168.0.11 29977 parent 2G 2G
84% bro
worker-2-5 worker 192.168.0.11 30100 child 514M 31M
10% bro
worker-2-6 worker 192.168.0.11 29978 parent 2G 2G
73% bro
worker-2-6 worker 192.168.0.11 29990 child 514M 30M
8% bro
worker-2-7 worker 192.168.0.11 29976 parent 2G 2G
76% bro
worker-2-7 worker 192.168.0.11 30081 child 514M 31M
10% bro
worker-2-8 worker 192.168.0.11 29963 parent 2G 2G
57% bro
worker-2-8 worker 192.168.0.11 29987 child 514M 30M
8% bro
worker-2-9 worker 192.168.0.11 29971 parent 2G 2G
52% bro
worker-2-9 worker 192.168.0.11 30096 child 514M 31M
10% bro
bro-1 $ free -g
total used free shared buffers cached
Mem: 62 62 0 0 0 17
-/+ buffers/cache: 44 17
Swap: 0 0 0
bro-2 $ free -g
total used free shared buffers cached
Mem: 62 45 17 0 0 1
-/+ buffers/cache: 44 18
Swap: 0 0 0
What do you guys think?
Regards,
Gary
PS ~
I've been reading the mailing list archives and it seems that folks with
the older Xeons with higher clock rates (3.4Ghzish), but fewer cores
were able to handle upwards of 400-500Mbps per worker process. I've also
seen it hinted, I think by Vlad G., that he was fitting in 28 workers on
boxes with similar core counts to my own, but slightly faster
processors. Based on some of those remarks in previous threads I was
thinking I should be able to handle a little over 300Mbps per process
with these processors, but I've only had the traffic to push about
200Mbps per worker so far.
I know some folks also like to put the manager and possibly the proxies
on separate boxes from the workers, but I haven't gotten a good sense as
to what kind of workload a proxy can handle. As far as proxies I've
mostly seen comments such as "I probably have way more proxies than I
need" or "Just keep adding proxies until they stop crashing". I don't
currently have a spare box for the manager and proxy, but would be
curious to know if folks feel it is a necessity. My observations on my
own setup are that my Bro workers typically are using 99% of a logical
core at peak network times, and my manager 150-175% (multi-threaded). My
workers seem to use about 1-2G of memory normally.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6257 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20131206/8f36f9ca/attachment.bin
More information about the Bro
mailing list