Robin,<br><br>Thanks for the quick reply. The "off by default" comment comes from section 7.6.1 of the user manual which states "Signature matching is off by default." I understand that Bro's emphasis (and therefore distinction from its competition) is that it relies as little as possible on signature matching. So much so that my concern as a newcomer to Bro is that signature matching is de-emphasized enough that it could suffer in performance.<br>
<br>For stream reassembly, I worded my question poorly. The blog post you mentioned (which was what I was thinking of when I wrote the questions) states that reassembly is only done on the first 1K of streams. So, I (perhaps unreasonably) do not consider that reassembly because I am very regularly interested in the 1K-2K range of a stream. <br>
<br>I read the CCS paper (though it's rather old!) and I think I now have a much better idea of what the internal sig matching engine uses, namely DFA (or at least that's what it used to use). I'm wondering how this compares with the Aho-Corasick NFA implementation of simple (non-regexp) string matching a la Snort, both in performance and memory consumption. I'd also be interested in comparisons on CPU cache efficiency.<br>
<br>Thanks,<br><br>Martin<br><br><div class="gmail_quote">On Thu, Apr 16, 2009 at 2:36 PM, Robin Sommer <span dir="ltr"><<a href="mailto:robin@icir.org">robin@icir.org</a>></span> wrote:<br><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
<div class="im"><br>
On Thu, Apr 16, 2009 at 13:44 -0500, Martin Holste wrote:<br>
<br>
> This raises a question that I've been wondering since poring over the 1.4<br>
> manual regarding how well Bro greps packets. Specifically, the manual says<br>
> that signatures are off by default and that the grepping is per-packet with<br>
> no stream reassembly capabilities.<br>
<br>
</div>Uh, does the manual really say that? Can you point me to where you<br>
found these statements?<br>
<br>
The signature is not really "off by default". Rather (like most<br>
functionality in Bro), it's only activated on demand when your<br>
configuration actually defines any signatures. It's true that we<br>
don't ship with many pre-built signatures[1]. But DPD for example<br>
uses those in policy/sigs/dpd.bro, and they are activated once you<br>
turn on DPD by loading dpd.bro.<br>
<br>
Likewise, pattern matching *is* usally done stream-wise, not on<br>
packets. More precisely, whenever Bro has reassembly enabled for a<br>
particular connection, the pattern matching is performed after<br>
reassembly. Only if Bro does not reassemble a connection, then<br>
pattern matching proceeds on packets. Generally, you can tell Bro<br>
pretty precisely which connections you want it to reassemble; by<br>
default, it reassembles the *beginning* of all TCP connections, and<br>
it then keeps the reassembler enabled for those for which it has<br>
found a suitable application-layer protocol analyzer.<br>
<br>
For more details (including options to control matching), please see<br>
this blog posting:<br>
<br>
<a href="http://blog.icir.org/2008/06/bro-signature-engine.html" target="_blank">http://blog.icir.org/2008/06/bro-signature-engine.html</a><br>
<div class="im"><br>
> It also appears that there's no particularly fancy pattern matching<br>
> engine under the hood, indicating that matching on full snaplengths<br>
> for many signatures produces high load.<br>
<br>
</div>Likewise, I'm wondering where you got the impression that there's no<br>
"fancy engine" (or what you'd consider a fancy one to look like :-).<br>
There's a paper describing the internals of Bro's approach in more<br>
detail if you are curious:<br>
<br>
<a href="http://www.icir.org/robin/papers/ccs03.ps" target="_blank">http://www.icir.org/robin/papers/ccs03.ps</a><br>
<br>
The paper also discusses various trade-offs in signature matching as<br>
well as the difficulty of fairly comparing multiple engines against<br>
each other.<br>
<div class="im"><br>
> I haven't measured this myself, so I'm wondering if this is the<br>
> case. Does anyone have any statisical (or anecdotal) evidence as<br>
> to how many sigs can run under a subnet with mostly web client<br>
> traffic?<br>
<br>
</div>The only systematic measurements I'm aware of are actually those in<br>
the older CCS paper mentioned above. Most people seem to use Bro's<br>
engine mostly with a small number of signatures as it's usally<br>
deployed as *support* for script-level analysis rather than as the<br>
primary detection tool by itself. I remember one specific case in<br>
which someone used a large number of signatures and had some<br>
performance trouble initially; that however was solvable by tuning<br>
the engine's options a bit.<br>
<br>
Hope this helps,<br>
<br>
Robin<br>
<br>
[1] Ignoring the ancient ones converted from Snort which aren't<br>
really useful anymore.<br>
<div><div></div><div class="h5"><br>
--<br>
Robin Sommer * Phone +1 (510) 666-2886 * <a href="mailto:robin@icir.org">robin@icir.org</a><br>
ICSI/LBNL * Fax +1 (510) 666-2956 * <a href="http://www.icir.org" target="_blank">www.icir.org</a><br>
</div></div></blockquote></div><br>