[Bro-Dev] Performance Enhancements

Azoff, Justin S jazoff at illinois.edu
Fri Oct 6 07:26:30 PDT 2017


> On Oct 6, 2017, at 12:10 AM, Clark, Gilbert <gc355804 at ohio.edu> wrote:
> 
> I'll note that one of the challenges with profiling is that there are the bro scripts, and then there is the bro engine.  The scripting layer has a completely different set of optimizations that might make sense than the engine does: turning off / turning on / tweaking different scripts can have a huge impact on Bro's relative performance depending on the frequency with which those script fragments are executed.  Thus, one way to look at speeding things up might be to take a look at the scripts that are run most often and seeing about ways to accelerate core pieces of them ... possibly by moving pieces of those scripts to builtins (as C methods).
> 

Re: scripts, I have some code I put together to do arbitrary benchmarks of templated bro scripts.  I need to clean it up and publish it, but I found some interesting things.  Function calls are relatively slow.. so things like

    ip in Site::local_nets

Is faster than calling

    Site::is_local_addr(ip);

inlining short functions could speed things up a bit.

I also found that things like

    port == 22/tcp || port == 3389/tcp

Is faster than checking if port in {22/tcp,3389/tcp}.. up to about 10 ports.. Having the hash class fallback to a linear search when the hash only contains few items could speed things up there.  Things like 'likely_server_ports' have 1 or 2 ports in most cases.


> If I had to guess at one engine-related thing that would've sped things up when I was profiling this stuff back in the day, it'd probably be rebuilding the memory allocation strategy / management.  From what I remember, Bro does do some malloc / free in the data path, which hurts quite a bit when one is trying to make things go fast.  It also means that the selection of a memory allocator and NUMA / per-node memory management is going to be important.  That's probably not going to qualify as something *small*, though ...

Ah!  This reminds me of something I was thinking about a few weeks ago.  I'm not sure to what extent bro uses memory allocation pools/interning for common immutable data structures.  Like for port objects or small strings.  There's no reason bro should be mallocing/freeing memory to create port objects when they are only 65536 times 2 (or 3?) port objects... but bro does things like

        tcp_hdr->Assign(0, new PortVal(ntohs(tp->th_sport), TRANSPORT_TCP));
        tcp_hdr->Assign(1, new PortVal(ntohs(tp->th_dport), TRANSPORT_TCP));

For every packet.  As well as allocating a ton of TYPE_COUNT vals for things like packet sizes and header lengths.. which will almost always be between 0 and 64k.

For things that can't be interned, like ipv6 address, having an allocation pool could speed things up... Instead of freeing things like IPAddr objects they could just be returned to a pool, and then when a new IPAddr object is needed, an already initialized object could be grabbed from the pool and 'refreshed' with the new value.

https://golang.org/pkg/sync/#Pool

Talks about that sort of thing.

> On a related note, a fun experiment is always to try running bro with a different allocator and seeing what happens ...

I recently noticed our boxes were using jemalloc instead of tcmalloc.. Switching that caused malloc to drop a few places down in 'perf top' output.


— 
Justin Azoff





More information about the bro-dev mailing list