[Bro-Dev] Performance Enhancements
Jim Mellander
jmellander at lbl.gov
Sat Oct 14 12:41:55 PDT 2017
Yeh, the lockless implementation has a bug:
if (size)
s/b
if (size & 1)
I ended up writing an checksum routine that sums 32 bits at a time into a
64 bit register, which avoids the need to check for overflow - it seems to
be faster than the full 64 bit implementation - will test with Bro and
report results.
On Thu, Oct 12, 2017 at 3:08 PM, Azoff, Justin S <jazoff at illinois.edu>
wrote:
>
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> >
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
> >
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only). Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable. Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> -
>
> So I still haven't gotten this to work, but I did some more tests that I
> think show it is worthwhile to look into replacing this function.
>
> I generated a large pcap of a 3 minute iperf run:
>
> $ du -hs iperf.pcap
> 9.6G iperf.pcap
> $ tcpdump -n -r iperf.pcap |wc -l
> reading from file iperf.pcap, link-type EN10MB (Ethernet)
> 7497698
>
> Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as
> well as cpu instructions reported by `perf`:
>
> $ python2 bench.py 5 bro -Cbr iperf.pcap
> 15.19 49947664388
> 15.66 49947827678
> 15.74 49947853306
> 15.66 49949603644
> 15.42 49951191958
> elapsed
> Min 15.18678689
> Max 15.7425909042
> Avg 15.5343231678
>
> instructions
> Min 49947664388
> Max 49951191958
> Avg 49948828194
>
> $ python2 bench.py 5 bro -br iperf.pcap
> 20.82 95502327077
> 21.31 95489729078
> 20.52 95483242217
> 21.45 95499193001
> 21.32 95498830971
> elapsed
> Min 20.5184400082
> Max 21.4452238083
> Avg 21.083449173
>
> instructions
> Min 95483242217
> Max 95502327077
> Avg 95494664468
>
>
> So this shows that for every ~7,500,000 packets bro processes, almost 5
> seconds is spent computing checksums.
>
> According to https://locklessinc.com/articles/tcp_checksum/, they run
> their benchmark 2^24 times (16,777,216) which is about 2.2 times as many
> packets.
>
> Their runtime starts out at about 11s, which puts it in line with the
> current implementation in bro. The other implementations they show are
> between 7 and 10x faster depending on packet size. A 90% drop in time
> spent computing checksums would be a noticeable improvement.
>
>
> Unfortunately I couldn't get their implementation to work inside of bro
> and get the right result, and even if I could, it's not clear what the
> license for the code is.
>
>
>
>
>
> —
> Justin Azoff
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171014/fe1a8985/attachment.html
More information about the bro-dev
mailing list