[Bro] Elasticsearch Writer vs logstash
seth at icir.org
Fri Jan 30 09:00:42 PST 2015
> On Jan 30, 2015, at 2:29 AM, anthony kasza <anthony.kasza at gmail.com> wrote:
> I thought the ES writer had some issues it needed worked out around indexes or something. Seth?
I’ll go ahead and do the long response. :)
This has been an area of confusion for people for quite some time. That’s been my fault to a great degree, I’ve been looking to provide guidance on the topic and offer easy configurations, but it’s been difficult to create that. I’ll do a break down of the current status of a number of different methods.
== Bro -> ES (with ES log writer) ==
This was the original output method and has been in Bro since 2.1 I think. It was written pretty quickly because we assumed that we would just be able to shove logs at ES as fast as we could and it could accept them. This method absolutely works on many small deployments and is very easy. Just load one script and away you go.
The problem with this method is that on larger deployments this causes the log messages to get queued up in Bro as the main Bro thread shuttles them over to the thread that will actually transfer the logs to ES. Typically people think this is a memory leak, but it’s just that too many logs are being held in memory and not getting a chance to be flushed because ES is taking so long to respond. It’s not a very fun result. We’ve been stuck for quite some time at this dilemma.
== Bro -> NSQ -> Forwarding tool -> ES ==
This seems to be the most promising mechanism right now. We take advantage of the fact that NSQ spools to disk to deal with any memory overload issues and it always quickly accepts logs from Bro which causes Bro to keep it log queues flushed nicely. There is a prototype of a tool for forwarding, but it’s still pretty rough. I haven’t had time to get back to it and clean it up and write documentation. (it’s written in Go if anyone’s interested in taking this on, get in touch with me!).
It looks like this method works well and can cope with ES becoming overloaded without causing anything to crash. There are still some larger questions that we need to answer relating to ES tuning because the default template that ES uses for Bro logs does a lot of stuff that we don’t need and causes a lot of unnecessary overhead. Vlad Grigorescu has done some work in this area, but in my opinion we still need to explore ways to automate this process.
== Bro -> JSON logs -> logstash -> ES ==
Some people are using this because it’s really easy to setup and somewhat resilient. At least Bro doesn’t get overwhelmed because it’s just writing logs to disk. I do recommend that people write JSON logs and try to avoid creating filters with logstash to parse the Bro logs. It’s just going to increase your work for this one small part of your overall architecture. The logstash config with JSON logs should be *very* short (I don’t have it offhand).
To get Bro to output JSON logs, it just takes putting this in local.bro (or some other loaded script)…
I really don’t know about the performance of logstash with really high log volume, but I don’t have high hopes for it either.
I hope this helps with some of the background. :)
International Computer Science Institute
(Bro) because everyone has a network
More information about the Bro