<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Yeh, the lockless implementation has a bug:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">if (size)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">s/b</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">if (size & 1)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I ended up writing an checksum routine that sums 32 bits at a time into a 64 bit register, which avoids the need to check for overflow - it seems to be faster than the full 64 bit implementation - will test with Bro and report results.<br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Oct 12, 2017 at 3:08 PM, Azoff, Justin S <span dir="ltr"><<a href="mailto:jazoff@illinois.edu" target="_blank">jazoff@illinois.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
> On Oct 6, 2017, at 5:59 PM, Jim Mellander <<a href="mailto:jmellander@lbl.gov">jmellander@lbl.gov</a>> wrote:<br>
><br>
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.<br>
><br>
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only). Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable. Here's a discussion of do the checksum calc in 64-bit arithmetic: <a href="https://locklessinc.com/articles/tcp_checksum/" rel="noreferrer" target="_blank">https://locklessinc.com/<wbr>articles/tcp_checksum/</a> -<br>
<br>
</span>So I still haven't gotten this to work, but I did some more tests that I think show it is worthwhile to look into replacing this function.<br>
<br>
I generated a large pcap of a 3 minute iperf run:<br>
<br>
$ du -hs iperf.pcap<br>
9.6G iperf.pcap<br>
$ tcpdump -n -r iperf.pcap |wc -l<br>
reading from file iperf.pcap, link-type EN10MB (Ethernet)<br>
7497698<br>
<br>
Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as well as cpu instructions reported by `perf`:<br>
<br>
$ python2 bench.py 5 bro -Cbr iperf.pcap<br>
15.19 49947664388<br>
15.66 49947827678<br>
15.74 49947853306<br>
15.66 49949603644<br>
15.42 49951191958<br>
elapsed<br>
Min 15.18678689<br>
Max 15.7425909042<br>
Avg 15.5343231678<br>
<br>
instructions<br>
Min 49947664388<br>
Max 49951191958<br>
Avg 49948828194<br>
<br>
$ python2 bench.py 5 bro -br iperf.pcap<br>
20.82 95502327077<br>
21.31 95489729078<br>
20.52 95483242217<br>
21.45 95499193001<br>
21.32 95498830971<br>
elapsed<br>
Min 20.5184400082<br>
Max 21.4452238083<br>
Avg 21.083449173<br>
<br>
instructions<br>
Min 95483242217<br>
Max 95502327077<br>
Avg 95494664468<br>
<br>
<br>
So this shows that for every ~7,500,000 packets bro processes, almost 5 seconds is spent computing checksums.<br>
<br>
According to <a href="https://locklessinc.com/articles/tcp_checksum/" rel="noreferrer" target="_blank">https://locklessinc.com/<wbr>articles/tcp_checksum/</a>, they run their benchmark 2^24 times (16,777,216) which is about 2.2 times as many packets.<br>
<br>
Their runtime starts out at about 11s, which puts it in line with the current implementation in bro. The other implementations they show are<br>
between 7 and 10x faster depending on packet size. A 90% drop in time spent computing checksums would be a noticeable improvement.<br>
<br>
<br>
Unfortunately I couldn't get their implementation to work inside of bro and get the right result, and even if I could, it's not clear what the license for the code is.<br>
<br>
<br>
<br>
<br>
<br>
—<br>
<span class="HOEnZb"><font color="#888888">Justin Azoff<br>
<br>
</font></span></blockquote></div><br></div>