[Bro-Dev] Sumstats bugs & fixes

Jim Mellander jmellander at lbl.gov
Wed Jan 17 16:30:10 PST 2018

In a previous email, I asked the question: Does anyone use sumstats outside
of a cluster context?

In my case, the answer is yes, as I perform development on a laptop, and
run bro standalone to test new bro policies.

I found several different bugs in
specifically SumStats::process_epoch_results

1. The policy returns sumstats results 50 at a time, and then reschedules
itself for the next 50 after 0.01 seconds.. Unfortunately, the reschedule
schedule 0.01 secs { process_epoch_result(ss, now, data1) };
instead of:
schedule 0.01 secs { *SumStats::*process_epoch_result(ss, now, data1) };
so it silently fails after the first 50 results.  Would be nice to have a
warning if a script schedules an event that doesn't exist.

2.The serious issue with the policy, though, is that the 'for' loop over
the result table is the main loop, with up to 50 items processed and
deleted within the loop, the expectation being that the iteration will not
thus be disturbed.

The attached program (hash_test.bro) demonstrates that this is not the case
(should output 1000, 0, but the 2nd value comes back non-zero), in line
with the documented caveat: "Currently, modifying a container’s membership
while iterating over it may result in undefined behavior, so do not add or
remove elements inside the loop."  I didn't examine bro source code to
appreciate the reason, but surmise that table resizing and rehashing would
account for the issue.

The consequences of this issue are that, under certain circumstances:
* Not all data will be returned by SumStats at the epoch
* SumStats::finish_epoch may not be run.

To address the issue can be done via a rearrangement of the code, along the
lines of the following pseudocode (boundary conditions, etc. ignored)

original (modifies table inside 'for' loop):

for (foo in bar)
    process bar[foo];
    delete bar[foo];
    if (i == 0) break;

to (modifies table outside 'for' loop):

while (i >0)
    for (foo in bar)
    process bar[foo];
    delete bar[foo]

...  there are a few other subtleties in the code (keeping a closure on the
result table so that sumstats can clear the original table & proceed to the
next epoch, and not running SumStats::finish_epoch if the result table was
empty to begin with).

A bit of rearrangement fixes the bugs while preserving the original
behavior, with the help of a wrapper event that checks for an empty result
table, and if not makes an explicit copy for further processing by the
actual event doing the work.

An additional 'for' loop around the result table could be used to keep it
all in one event, but looks too much like black magic (and still, albeit
probably in a safe way, depending on undefined behavior) - I prefer clear,
understandable code (ha!), rather than "dark corner" features.  Six months
later, when I look at the code, I won't be able to remember the clever
trick I was using :-)

Attached please find hash_test.bro & (patched) non-cluster.bro

Jim Mellander
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180117/75a9fd40/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: hash_test.bro
Type: application/octet-stream
Size: 182 bytes
Desc: not available
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180117/75a9fd40/attachment.obj 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: non-cluster.bro
Type: application/octet-stream
Size: 2204 bytes
Desc: not available
Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180117/75a9fd40/attachment-0001.obj 

More information about the bro-dev mailing list