[Bro-Dev] Performance Enhancements
jmellander at lbl.gov
Fri Oct 6 17:00:09 PDT 2017
Interesting info. The > order of magnitude difference in time between
BaseList::remove & BaseList::removenth suggests the possibility that the
for loop in BaseList::remove is falling off the end in many cases (i.e.
attempting to remove an item that doesn't exist). Maybe thats whats broken.
On Fri, Oct 6, 2017 at 3:49 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only). Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable. Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> - this website also has an x64 allocator that is claimed to be faster than
> tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml (note:
> I haven't tried anything from this source, but find it interesting).
> > I'm guessing there are a number of such "small" optimizations that could
> provide significant performance gains.
> > Take care,
> > Jim
> I've been messing around with 'perf top', the one's complement function
> often shows up fairly high up.. that, PriorityQueue::BubbleDown, and
> Something (on our configuration?) is doing a lot of
> PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class
> before in bro.. I think a script may be triggering something that is
> hurting performance. I can't think of what it would be though.
> Running perf top on a random worker right now with -F 19999 shows:
> Samples: 485K of event 'cycles', Event count (approx.): 26046568975
> Overhead Shared Object Symbol
> 34.64% bro [.] BaseList::remove
> 3.32% libtcmalloc.so.4.2.6 [.] operator delete
> 3.25% bro [.] PriorityQueue::BubbleDown
> 2.31% bro [.] BaseList::remove_nth
> 2.05% libtcmalloc.so.4.2.6 [.] operator new
> 1.90% bro [.] Attributes::FindAttr
> 1.41% bro [.] Dictionary::NextEntry
> 1.27% libc-2.17.so [.] __memcpy_ssse3_back
> 0.97% bro [.] StmtList::Exec
> 0.87% bro [.] Dictionary::Lookup
> 0.85% bro [.] NameExpr::Eval
> 0.84% bro [.] BroFunc::Call
> 0.80% libtcmalloc.so.4.2.6 [.] tc_free
> 0.77% libtcmalloc.so.4.2.6 [.] operator delete
> 0.70% bro [.] ones_complement_checksum
> 0.60% libtcmalloc.so.4.2.6 [.] tcmalloc::ThreadCache::
> 0.60% bro [.] RecordVal::RecordVal
> 0.53% bro [.] UnaryExpr::Eval
> 0.51% bro [.] ExprStmt::Exec
> 0.51% bro [.] iosource::Manager::FindSoonest
> 0.50% libtcmalloc.so.4.2.6 [.] operator new
> Which sums up to 59.2%
> BaseList::remove/BaseList::remove_nth seems particularly easy to
> optimize. Can't that loop be replaced by a memmove?
> I think something may be broken if it's being called that much though.
> Justin Azoff
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the bro-dev