[Bro] elastic search / bro questions
nick.turley at ucr.edu
Thu Nov 6 18:59:36 PST 2014
We tried the ElasticSearch writer with mixed results. Our Bro cluster (two workers, one manager/proxy) processes somewhere the same number of events you’re dealing with and we write all Bro data to a RAID10 array. We then have a Logstash shipper grab the logs as fast as they come in and ship those off to a couple REDIS systems. We then have a Logstash indexer pull the data from REDIS and mutate the data in various ways (rename attributes, geolocation) before being shipped off to our ElasticSearch cluster. We have limited hardware but we’ve been able to pump data fast enough for 10Gbps link which peaks around 3.5Gbps. The architecture is more complicated, but its scalable.
Having REDIS in place makes it nice when you have traffic bursts and Logstash needs time to catch up. It acts as a nice buffer. Logstash can also be tuned to have multiple worker processes and filter threads which took us some time to tune. It’s a bit of a balancing act.
.. And I just saw your response. Sounds like Logstash is not a good option for you.
From: M K <mkhan04 at gmail.com<mailto:mkhan04 at gmail.com>>
Date: Thursday, November 6, 2014 at 6:25 PM
To: Joe Blow <blackhole.em at gmail.com<mailto:blackhole.em at gmail.com>>
Cc: "bro at bro-ids.org<mailto:bro at bro-ids.org> List" <bro at bro-ids.org<mailto:bro at bro-ids.org>>
Subject: Re: [Bro] elastic search / bro questions
Unless it's changed within the past month or so, the ElasticSearch writer that comes with Bro is very alpha-level code. For the most part it fires and forgets and can be prone to losing messages if your cluster isn't able to keep up or some other situation causes it not to be able to ingest the data properly.
Your best bet, as of now, is to write out the logs to disk and use some intermediary program to process the logs and ingest them into ES. Logstash can help, but with the default custom format Bro uses, it can't parse the data properly. If you're using Bro 2.3, you can modify the output format of the ascii writer to use json instead and then use logstash to feed the data relatively easily into ES. Further, I'd recommend using a rabbit river so ES can ingest the data at its leisure.
If you're stuck with the non-json format, well your options are kinda limited. You can write a crazy custom logstash conf using grok (which is super inefficient) or figure out some other mechanism.
As an aside, I've written a custom logstash filter that processes the custom bro format and is, to a limited extent, bro type aware so it can take old-style bro logs relatively easily and make it more usable (numbers are turned into numbers and sets, vectors and tables are turned into arrays -- same as how I've seen the ES writer output data). There are some caveats in its usage though. I'm putting the finishing touches on it and plan to release it when I get a chance (hopefully within the next week or two).
On Thu, Nov 6, 2014 at 7:54 PM, Joe Blow <blackhole.em at gmail.com<mailto:blackhole.em at gmail.com>> wrote:
Just going to throw this out there and hope some people are willing to potentially share some learning experiences if they have any.
We have a system which generates around 15k-30k BRO events/sec and are trying to ingest these logs into a fairly beefy elasticsearch cluster. Total cluster memory ~300GB, storage ~300TB.
Long story short, we're having some problems keeping up with this feed. Does anyone have any performance tuning with this module? I've played a lot with rsyslog batch sizes with elasticsearch and was hoping there would be some simple directive i could try and apply to BRO.
Does anyone have this experience here? Does this module batch anything?
Thanks in advance.
Bro mailing list
bro at bro-ids.org<mailto:bro at bro-ids.org>
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Bro