<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Interesting info. The > order of magnitude difference in time between BaseList::remove & BaseList::removenth suggests the possibility that the for loop in BaseList::remove is falling off the end in many cases (i.e. attempting to remove an item that doesn't exist). Maybe thats whats broken.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Oct 6, 2017 at 3:49 PM, Azoff, Justin S <span dir="ltr"><<a href="mailto:jazoff@illinois.edu" target="_blank">jazoff@illinois.edu</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class=""><br>
> On Oct 6, 2017, at 5:59 PM, Jim Mellander <<a href="mailto:jmellander@lbl.gov">jmellander@lbl.gov</a>> wrote:<br>
><br>
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.<br>
><br>
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only). Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable. Here's a discussion of do the checksum calc in 64-bit arithmetic: <a href="https://locklessinc.com/articles/tcp_checksum/" rel="noreferrer" target="_blank">https://locklessinc.com/<wbr>articles/tcp_checksum/</a> - this website also has an x64 allocator that is claimed to be faster than tcmalloc, see: <a href="https://locklessinc.com/benchmarks_allocator.shtml" rel="noreferrer" target="_blank">https://locklessinc.com/<wbr>benchmarks_allocator.shtml</a> (note: I haven't tried anything from this source, but find it interesting).<br>
><br>
> I'm guessing there are a number of such "small" optimizations that could provide significant performance gains.<br>
><br>
> Take care,<br>
><br>
> Jim<br>
<br>
</span>I've been messing around with 'perf top', the one's complement function often shows up fairly high up.. that, PriorityQueue::BubbleDown, and BaseList::remove<br>
<br>
Something (on our configuration?) is doing a lot of PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class before in bro.. I think a script may be triggering something that is hurting performance. I can't think of what it would be though.<br>
<br>
Running perf top on a random worker right now with -F 19999 shows:<br>
<br>
Samples: 485K of event 'cycles', Event count (approx.): 26046568975<br>
Overhead Shared Object Symbol<br>
34.64% bro [.] BaseList::remove<br>
3.32% libtcmalloc.so.4.2.6 [.] operator delete<br>
3.25% bro [.] PriorityQueue::BubbleDown<br>
2.31% bro [.] BaseList::remove_nth<br>
2.05% libtcmalloc.so.4.2.6 [.] operator new<br>
1.90% bro [.] Attributes::FindAttr<br>
1.41% bro [.] Dictionary::NextEntry<br>
1.27% <a href="http://libc-2.17.so" rel="noreferrer" target="_blank">libc-2.17.so</a> [.] __memcpy_ssse3_back<br>
0.97% bro [.] StmtList::Exec<br>
0.87% bro [.] Dictionary::Lookup<br>
0.85% bro [.] NameExpr::Eval<br>
0.84% bro [.] BroFunc::Call<br>
0.80% libtcmalloc.so.4.2.6 [.] tc_free<br>
0.77% libtcmalloc.so.4.2.6 [.] operator delete[]<br>
0.70% bro [.] ones_complement_checksum<br>
0.60% libtcmalloc.so.4.2.6 [.] tcmalloc::ThreadCache::<wbr>ReleaseToCentralCache<br>
0.60% bro [.] RecordVal::RecordVal<br>
0.53% bro [.] UnaryExpr::Eval<br>
0.51% bro [.] ExprStmt::Exec<br>
0.51% bro [.] iosource::Manager::FindSoonest<br>
0.50% libtcmalloc.so.4.2.6 [.] operator new[]<br>
<br>
<br>
Which sums up to 59.2%<br>
<br>
BaseList::remove/BaseList::<wbr>remove_nth seems particularly easy to optimize. Can't that loop be replaced by a memmove?<br>
I think something may be broken if it's being called that much though.<br>
<br>
<br>
<br>
—<br>
<span class="HOEnZb"><font color="#888888">Justin Azoff<br>
<br>
</font></span></blockquote></div><br></div>