[Bro] ES cluster and logstash

Thu May 3 08:11:35 PDT 2018

Thank you for your response Patrick! To be honest, I am not sure just how
large our data set is. One of the problems we have is that we just don't
have enough disk space to unpack our gzip'd logs to see what a year would
look like. Do you happen to have a good document on how you are interfacing
kafka with logstash?

Erik

On Thu, May 3, 2018 at 11:05 AM, Patrick Cain <pcain at coopercain.com> wrote:

> Hi,
>
> From my adventures with two ES clusters receiving bro logs:
> The big “sizing” issues relate to how many bro events are being inserted
> into ES. One of my ES clusters does about 25k events/sec from bro. We found
> this is more than a simple bro ->logstash ->ES can take. We ended up
> putting a kafka buffer between bro and logstash, so now it’s  bro -> kafka
> -> logstash -> ES.  We tried with other buffer-type things, like syslog-ng
> and a rabbitmq, but ended up on kafka since there is a bro-pkg for it, too.
> I hear some people like redis for this funstion.  So bro can now burst
> event logs out and kafka makes the logstash to ES process run at a more
> sustained pace. Kafka and logstash run on some old box we had laying around
> so the bro box doesn't have to do anything but slop logs out.
> The other issue was the rate at which ES can insert events when it’s doing
> other stuff. We ended up making data nodes just do data; the masters just
> be masters, and specifically crafting “insertion” nodes that only talk to
> logstash. This takes the index loading and computational work off the data
> nodes.  A couple of our systems have a master node and a data node on them
> since the masters use very little resources. Note when you search in ES it
> pauses its other activities since ES thinks search is its primary function
> in life. So insertion rate drops.  What we found is that having specific
> insertion nodes lets ES keep taking inserts even when you do heavy
> searches. Keeps my OPS people happy. ☺
>
> When I saw your "25TB of data per year" I chuckled...  My *small* ES
> cluster is three Dell R380s (each with 20C, 64GB, 35TB disk). (I should
> have gotten more memory.) But our 95TB of disk lets us keep about 120 days
> of bro logs before curator deletes old logs to makes us more free disk
> space. This cluster has been continuously up about 4 months since I last
> played with the configs, so I'm content. I think your 4 data nodes is fine.
> If your insertion rate is high make an insertion node on one of your
> masters.
>
> Pat
> p.s. Keep your java heap under 26GB; 23GB if you hate compressed java
> pointers.
>
> From: bro-bounces at bro.org <bro-bounces at bro.org> On Behalf Of erik clark
> Sent: Friday, April 27, 2018 8:48 AM
> To: Bro-IDS <bro at bro.org>
> Subject: Re: [Bro] ES cluster and logstash
>
> We are looking to set up a proper ES cluster and dumping bro logs into it
> via logstash. The thought is to have 6 ES nodes (2 dedicated master, 4 data
> nodes). If we are dumping 15 TB of data into the cluster a year (possibly
> as high as 20 or 25TB) from Bro, is 4 data nodes sufficient? The boxen will
> only have 64 gigs of ram (30 for java heap, 34 for system use) and probably
> 16 discrete cores. I have a feeling that this is horribly insufficient for
> a data cluster of that size.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20180503/49c53b5b/attachment.html