[Bro] Elasticsearch Writer vs logstash

Luis Miguel Silva luismiguelferreirasilva at gmail.com
Fri Jan 30 11:03:45 PST 2015


Everyone, since Seth's answer was the most complete one, I'll just reply to
this one and talk about the other options you guys kindly pointed me to!

#######################################################

Seth,

Maybe you can give me some advice on what will probably work best for my
usage scenario...
- I have an ARM 7 appliance (with a dual core at 1ghz processor and 1gb of ram)
- I want to MOSTLY run the default bro config, just to learn more about the
network and see what is going on (e.g. identify hosts, software and some
more relevant connection info about what is going on)
- It will be capturing my home traffic (e.g. 4 users total and a total of
about 10-15 devices)
- the appliance isn't super powerful, so I'm thinking about:
-- *option 1*: sending the resulting logs to a remote database of some sort
(and I include "elasticsearch" in my database definition though, in the
end, I might end up using mongo or a relational database of some sort) and
query information offline (as to not overload the appliance)
-- *option 2*: (and I don't even know if this is possible), set up some
sort of a bro cluster, where the appliance sniffs and pre-filters the
connections, but offlines the actual processing work / log writing to one
or more remote machines
--- is this even possible?

The reason why I'm asking is that, if I can get the log files to land and
the processing to occur in a remote machine, then maybe I do not
necessarily care about how I then convert the logs into a format I can
easily query...

Given that I do not even think that option 2 is a real option, here are all
the options that have been presented thus far:
*option 1*- Bro [at appliance] -> ES [at remote location]
*option 2*- Bro [at appliance] -> NSQ [at appliance] -> Forwarding tool [at
remote location] -> ES [at remote location]
*option 3*- Bro -> JSON logs [at appliance] -> logstash [at appliance] ->
ES [at remote location]
*option 4*- Bro -> nxlog [at appliance] -> logstash [at remote location] ->
ES [at remote location] (does it even make sense to consider this option if
Bro can output json?)
*option 5*- Bro -> Heka [at appliance] -> ES [at remote location] (how
heavy is Heka and how well does it perform?)
*option 6*- Bro -> rsyslog [at appliance] -> logstash [at remote location]
-> ES [at remote location]

*Note*: "[at appliance]" means the processing would happen on the appliance
and "[at remote location]" means the location where the data ends up at.

So the question is, given that I want to minimize the amount of processing
/ memory, I want to offload the "log data" anyway AND I'll be monitoring a
typical home connection, what option do you guys think will work best for
me?

At a glance, I think *option 2* and *option 4 and 6* (which are very
similar option, we just change the local log forwarding service) are the
options that will perform the best.

Thank you,
Luis

On Fri, Jan 30, 2015 at 10:00 AM, Seth Hall <seth at icir.org> wrote:

>
> > On Jan 30, 2015, at 2:29 AM, anthony kasza <anthony.kasza at gmail.com>
> wrote:
> >
> > I thought the ES writer had some issues it needed worked out around
> indexes or something. Seth?
>
> I’ll go ahead and do the long response. :)
>
> This has been an area of confusion for people for quite some time.  That’s
> been my fault to a great degree, I’ve been looking to provide guidance on
> the topic and offer easy configurations, but it’s been difficult to create
> that.  I’ll do a break down of the current status of a number of different
> methods.
>
> == Bro -> ES (with ES log writer) ==
> This was the original output method and has been in Bro since 2.1 I
> think.  It was written pretty quickly because we assumed that we would just
> be able to shove logs at ES as fast as we could and it could accept them.
> This method absolutely works on many small deployments and is very easy.
> Just load one script and away you go.
>
> The problem with this method is that on larger deployments this causes the
> log messages to get queued up in Bro as the main Bro thread shuttles them
> over to the thread that will actually transfer the logs to ES.  Typically
> people think this is a memory leak, but it’s just that too many logs are
> being held in memory and not getting a chance to be flushed because ES is
> taking so long to respond.  It’s not a very fun result.  We’ve been stuck
> for quite some time at this dilemma.
>
> == Bro -> NSQ -> Forwarding tool -> ES ==
> This seems to be the most promising mechanism right now.  We take
> advantage of the fact that NSQ spools to disk to deal with any memory
> overload issues and it always quickly accepts logs from Bro which causes
> Bro to keep it log queues flushed nicely.  There is a prototype of a tool
> for forwarding, but it’s still pretty rough.  I haven’t had time to get
> back to it and clean it up and write documentation. (it’s written in Go if
> anyone’s interested in taking this on, get in touch with me!).
>
> It looks like this method works well and can cope with ES becoming
> overloaded without causing anything to crash.  There are still some larger
> questions that we need to answer relating to ES tuning because the default
> template that ES uses for Bro logs does a lot of stuff that we don’t need
> and causes a lot of unnecessary overhead.  Vlad Grigorescu has done some
> work in this area, but in my opinion we still need to explore ways to
> automate this process.
>
> == Bro -> JSON logs -> logstash -> ES ==
> Some people are using this because it’s really easy to setup and somewhat
> resilient.  At least Bro doesn’t get overwhelmed because it’s just writing
> logs to disk.  I do recommend that people write JSON logs and try to avoid
> creating filters with logstash to parse the Bro logs.  It’s just going to
> increase your work for this one small part of your overall architecture.
> The logstash config with JSON logs should be *very* short (I don’t have it
> offhand).
>
> To get Bro to output JSON logs, it just takes putting this in local.bro
> (or some other loaded script)…
> redef LogAscii::use_json=T;
>
> I really don’t know about the performance of logstash with really high log
> volume, but I don’t have high hopes for it either.
>
> I hope this helps with some of the background. :)
>
>   .Seth
>
> --
> Seth Hall
> International Computer Science Institute
> (Bro) because everyone has a network
> http://www.bro.org/
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20150130/d269d448/attachment-0001.html 


More information about the Bro mailing list