[Bro-Dev] Performance Enhancements

Azoff, Justin S jazoff at illinois.edu
Fri Oct 6 15:49:30 PDT 2017


> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> 
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.
> 
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only).  Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable.  Here's a discussion of do the checksum calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ - this website also has an x64 allocator that is claimed to be faster than tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note: I haven't tried anything from this source, but find it interesting).
> 
> I'm guessing there are a number of such "small" optimizations that could provide significant performance gains.
> 
> Take care,
> 
> Jim

I've been messing around with 'perf top', the one's complement function often shows up fairly high up.. that, PriorityQueue::BubbleDown, and BaseList::remove

Something (on our configuration?) is doing a lot of PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class before in bro.. I think a script may be triggering something that is hurting performance.  I can't think of what it would be though.

Running perf top on a random worker right now with -F 19999 shows:

Samples: 485K of event 'cycles', Event count (approx.): 26046568975
Overhead  Shared Object                 Symbol
  34.64%  bro                           [.] BaseList::remove
   3.32%  libtcmalloc.so.4.2.6          [.] operator delete
   3.25%  bro                           [.] PriorityQueue::BubbleDown
   2.31%  bro                           [.] BaseList::remove_nth
   2.05%  libtcmalloc.so.4.2.6          [.] operator new
   1.90%  bro                           [.] Attributes::FindAttr
   1.41%  bro                           [.] Dictionary::NextEntry
   1.27%  libc-2.17.so                  [.] __memcpy_ssse3_back
   0.97%  bro                           [.] StmtList::Exec
   0.87%  bro                           [.] Dictionary::Lookup
   0.85%  bro                           [.] NameExpr::Eval
   0.84%  bro                           [.] BroFunc::Call
   0.80%  libtcmalloc.so.4.2.6          [.] tc_free
   0.77%  libtcmalloc.so.4.2.6          [.] operator delete[]
   0.70%  bro                           [.] ones_complement_checksum
   0.60%  libtcmalloc.so.4.2.6          [.] tcmalloc::ThreadCache::ReleaseToCentralCache
   0.60%  bro                           [.] RecordVal::RecordVal
   0.53%  bro                           [.] UnaryExpr::Eval
   0.51%  bro                           [.] ExprStmt::Exec
   0.51%  bro                           [.] iosource::Manager::FindSoonest
   0.50%  libtcmalloc.so.4.2.6          [.] operator new[]


Which sums up to 59.2%

BaseList::remove/BaseList::remove_nth seems particularly easy to optimize. Can't that loop be replaced by a memmove?
I think something may be broken if it's being called that much though.



— 
Justin Azoff




More information about the bro-dev mailing list