[Bro-Dev] Performance Enhancements
Azoff, Justin S
jazoff at illinois.edu
Thu Oct 12 15:08:04 PDT 2017
> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
>
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.
>
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only). Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable. Here's a discussion of do the checksum calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ -
So I still haven't gotten this to work, but I did some more tests that I think show it is worthwhile to look into replacing this function.
I generated a large pcap of a 3 minute iperf run:
$ du -hs iperf.pcap
9.6G iperf.pcap
$ tcpdump -n -r iperf.pcap |wc -l
reading from file iperf.pcap, link-type EN10MB (Ethernet)
7497698
Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as well as cpu instructions reported by `perf`:
$ python2 bench.py 5 bro -Cbr iperf.pcap
15.19 49947664388
15.66 49947827678
15.74 49947853306
15.66 49949603644
15.42 49951191958
elapsed
Min 15.18678689
Max 15.7425909042
Avg 15.5343231678
instructions
Min 49947664388
Max 49951191958
Avg 49948828194
$ python2 bench.py 5 bro -br iperf.pcap
20.82 95502327077
21.31 95489729078
20.52 95483242217
21.45 95499193001
21.32 95498830971
elapsed
Min 20.5184400082
Max 21.4452238083
Avg 21.083449173
instructions
Min 95483242217
Max 95502327077
Avg 95494664468
So this shows that for every ~7,500,000 packets bro processes, almost 5 seconds is spent computing checksums.
According to https://locklessinc.com/articles/tcp_checksum/, they run their benchmark 2^24 times (16,777,216) which is about 2.2 times as many packets.
Their runtime starts out at about 11s, which puts it in line with the current implementation in bro. The other implementations they show are
between 7 and 10x faster depending on packet size. A 90% drop in time spent computing checksums would be a noticeable improvement.
Unfortunately I couldn't get their implementation to work inside of bro and get the right result, and even if I could, it's not clear what the license for the code is.
—
Justin Azoff
More information about the bro-dev
mailing list