[Bro-Dev] Performance Enhancements

Jim Mellander jmellander at lbl.gov
Sat Oct 14 12:41:55 PDT 2017


Yeh, the lockless implementation has a bug:

if (size)

s/b

if (size & 1)

I ended up writing an checksum routine that sums 32 bits at a time into a
64 bit register, which avoids the need to check for overflow - it seems to
be faster than the full 64 bit implementation - will test with Bro and
report results.

On Thu, Oct 12, 2017 at 3:08 PM, Azoff, Justin S <jazoff at illinois.edu>
wrote:

>
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> >
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
> >
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only).  Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable.  Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> -
>
> So I still haven't gotten this to work, but I did some more tests that I
> think show it is worthwhile to look into replacing this function.
>
> I generated a large pcap of a 3 minute iperf run:
>
>     $ du -hs iperf.pcap
>     9.6G        iperf.pcap
>     $ tcpdump  -n -r iperf.pcap |wc -l
>     reading from file iperf.pcap, link-type EN10MB (Ethernet)
>     7497698
>
> Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as
> well as cpu instructions reported by `perf`:
>
>     $ python2 bench.py 5 bro -Cbr iperf.pcap
>     15.19 49947664388
>     15.66 49947827678
>     15.74 49947853306
>     15.66 49949603644
>     15.42 49951191958
>     elapsed
>     Min 15.18678689
>     Max 15.7425909042
>     Avg 15.5343231678
>
>     instructions
>     Min 49947664388
>     Max 49951191958
>     Avg 49948828194
>
>     $ python2 bench.py 5 bro -br iperf.pcap
>     20.82 95502327077
>     21.31 95489729078
>     20.52 95483242217
>     21.45 95499193001
>     21.32 95498830971
>     elapsed
>     Min 20.5184400082
>     Max 21.4452238083
>     Avg 21.083449173
>
>     instructions
>     Min 95483242217
>     Max 95502327077
>     Avg 95494664468
>
>
> So this shows that for every ~7,500,000 packets bro processes, almost 5
> seconds is spent computing checksums.
>
> According to https://locklessinc.com/articles/tcp_checksum/, they run
> their benchmark 2^24 times (16,777,216) which is about 2.2 times as many
> packets.
>
> Their runtime starts out at about 11s, which puts it in line with the
> current implementation in bro.  The other implementations they show are
> between 7 and 10x faster depending on packet size.  A 90% drop in time
> spent computing checksums would be a noticeable improvement.
>
>
> Unfortunately I couldn't get their implementation to work inside of bro
> and get the right result, and even if I could, it's not clear what the
> license for the code is.
>
>
>
>
>
>> Justin Azoff
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171014/fe1a8985/attachment.html 


More information about the bro-dev mailing list