[Bro] Use of GPUs for signature matching?

Mon Oct 25 21:47:44 PDT 2010

On Fri, Oct 22, 2010 at 12:46 -0500, you wrote:

> Robin, can you elaborate on this a bit?  I'm very surprised that
> pattern matching would not be the first bottleneck.

The answer is quiet simple actually: Bro just doesn't do that much
pattern matching. While it has a pattern engine similar to what
Snort/Suricata are relying on, a typical Bro setup doesn't use it
very much at all: typically there are just a few signatures
configured, often just for doing dynamic protocol detection. 

Bro is doing a lot of other things instead, in particular deep
stateful protocol analysis and execution of its analysis scripts. In
particular the latter is getting more and more expensive compared to
Bro's other components: scripts are becoming larger and more
complex, they track more state, and they have to deal with more
traffic to analyze. The script interpreter is a piece we haven't
spend much time on optimizing yet (it's indeed still an
*interpreter* ...), and it actually accounts for a large share of
Bro's CPU (and also memory) footprint these days. 

Note that executing scripts written in Bro's language is much
different from doing pattern matching; improving regexp performance
is not going to help much at all with the scripts. That's quite
different from Snort/Suricata obviously, which don't do much else
than pattern mastching.

> Marty's point was that multithreading leads to CPU cache
> inefficiency which incurs a penalty greater than the boost to the
> pattern matching in parallel and therefore suggests flow-pinned
> load-balancing for scaling.  Do you have an opinion on the matter?

It's hard to answer that in a few sentences, but generally I agree
that a flow-based load-balancing scheme is a reasonable approach for
the lowest layer of the system. Many NIDS (includig Snort and Bro)
do much of their work on a per-flow basis, so parallelzing at that
granularity certainly makes a lot of sense and avoids communication
overhead (and hence also cache issues). Generally, such a flow-based
scheme can then be implemented either at the system/process level
(i.e., running more than one instance of the NIDS, with a suitable
frontend load-balancer splitting up the work, either externally or
internally); or at the thread-level (multiple threads fed by a
master thread). Conceptually, that doesn't make a lot of a
difference, and the former is what we're doing with the Bro Cluster.

Now, Snort has the "advantage" that such a simple flow-based scheme
is pretty much all it needs to do for parallelizing. Because there's
not much happening after the pattern matching step, there's also no
need for further coordination between the instances/threads. For
Bro, however, this is where things actually start to get
interesting: since much of its CPU cycles are spent for the scripts,
Amdahl's Law tells us that we need to parallelize the interpreter if
we want to scale.  Unfortunately, parallelizing the execution of a
free-form Turing-complete language isn't exactly trivial ... 

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org