[Bro-Dev] Performance Enhancements

Azoff, Justin S jazoff at illinois.edu
Thu Oct 12 15:08:04 PDT 2017

> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only).  Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable.  Here's a discussion of do the checksum calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ -

So I still haven't gotten this to work, but I did some more tests that I think show it is worthwhile to look into replacing this function.

I generated a large pcap of a 3 minute iperf run:

    $ du -hs iperf.pcap
    9.6G	iperf.pcap
    $ tcpdump  -n -r iperf.pcap |wc -l
    reading from file iperf.pcap, link-type EN10MB (Ethernet)

Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as well as cpu instructions reported by `perf`:

    $ python2 bench.py 5 bro -Cbr iperf.pcap
    15.19 49947664388
    15.66 49947827678
    15.74 49947853306
    15.66 49949603644
    15.42 49951191958
    Min 15.18678689
    Max 15.7425909042
    Avg 15.5343231678
    Min 49947664388
    Max 49951191958
    Avg 49948828194
    $ python2 bench.py 5 bro -br iperf.pcap
    20.82 95502327077
    21.31 95489729078
    20.52 95483242217
    21.45 95499193001
    21.32 95498830971
    Min 20.5184400082
    Max 21.4452238083
    Avg 21.083449173
    Min 95483242217
    Max 95502327077
    Avg 95494664468

So this shows that for every ~7,500,000 packets bro processes, almost 5 seconds is spent computing checksums.

According to https://locklessinc.com/articles/tcp_checksum/, they run their benchmark 2^24 times (16,777,216) which is about 2.2 times as many packets.

Their runtime starts out at about 11s, which puts it in line with the current implementation in bro.  The other implementations they show are
between 7 and 10x faster depending on packet size.  A 90% drop in time spent computing checksums would be a noticeable improvement.

Unfortunately I couldn't get their implementation to work inside of bro and get the right result, and even if I could, it's not clear what the license for the code is.

Justin Azoff

More information about the bro-dev mailing list