[Bro] Use of GPUs for signature matching?

Tue Oct 26 06:54:18 PDT 2010

Ok, this makes a lot of sense now.  So you're saying that for the few
true pattern matching activities Bro has to do, there's plenty of CPU
to spare, but for script execution such as going to time-machine,
extracting files from pcap, etc., you're running out of CPU.

So if you're running into a performance challenge with the scripting
language, would you consider switching from the native Bro scripting
language to an embedded interpreter from something like Perl, Python,
or Lua?  That in and of itself probably would hurt performance, but my
guess is that it would take a lot less time to embed something and
then multi-thread it then rolling your own from scratch.  With the
increase in number of CPU cores climbing exponentially, a small
performance hit would probably be acceptable if it can be offset by
running on multiple cores.  I think a well-known script language would
also be a lot less scary for newcomers to Bro and really increase its
user base.

On Mon, Oct 25, 2010 at 11:47 PM, Robin Sommer <robin at icir.org> wrote:
>
> On Fri, Oct 22, 2010 at 12:46 -0500, you wrote:
>
>> Robin, can you elaborate on this a bit?  I'm very surprised that
>> pattern matching would not be the first bottleneck.
>
> The answer is quiet simple actually: Bro just doesn't do that much
> pattern matching. While it has a pattern engine similar to what
> Snort/Suricata are relying on, a typical Bro setup doesn't use it
> very much at all: typically there are just a few signatures
> configured, often just for doing dynamic protocol detection.
>
> Bro is doing a lot of other things instead, in particular deep
> stateful protocol analysis and execution of its analysis scripts. In
> particular the latter is getting more and more expensive compared to
> Bro's other components: scripts are becoming larger and more
> complex, they track more state, and they have to deal with more
> traffic to analyze. The script interpreter is a piece we haven't
> spend much time on optimizing yet (it's indeed still an
> *interpreter* ...), and it actually accounts for a large share of
> Bro's CPU (and also memory) footprint these days.
>
> Note that executing scripts written in Bro's language is much
> different from doing pattern matching; improving regexp performance
> is not going to help much at all with the scripts. That's quite
> different from Snort/Suricata obviously, which don't do much else
> than pattern mastching.
>
>> Marty's point was that multithreading leads to CPU cache
>> inefficiency which incurs a penalty greater than the boost to the
>> pattern matching in parallel and therefore suggests flow-pinned
>> load-balancing for scaling.  Do you have an opinion on the matter?
>
> It's hard to answer that in a few sentences, but generally I agree
> that a flow-based load-balancing scheme is a reasonable approach for
> the lowest layer of the system. Many NIDS (includig Snort and Bro)
> do much of their work on a per-flow basis, so parallelzing at that
> granularity certainly makes a lot of sense and avoids communication
> overhead (and hence also cache issues). Generally, such a flow-based
> scheme can then be implemented either at the system/process level
> (i.e., running more than one instance of the NIDS, with a suitable
> frontend load-balancer splitting up the work, either externally or
> internally); or at the thread-level (multiple threads fed by a
> master thread). Conceptually, that doesn't make a lot of a
> difference, and the former is what we're doing with the Bro Cluster.
>
> Now, Snort has the "advantage" that such a simple flow-based scheme
> is pretty much all it needs to do for parallelizing. Because there's
> not much happening after the pattern matching step, there's also no
> need for further coordination between the instances/threads. For
> Bro, however, this is where things actually start to get
> interesting: since much of its CPU cycles are spent for the scripts,
> Amdahl's Law tells us that we need to parallelize the interpreter if
> we want to scale.  Unfortunately, parallelizing the execution of a
> free-form Turing-complete language isn't exactly trivial ...
>
> Robin
>
> --
> Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org
> ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org
>