<div dir="ltr"><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">In a previous email, I asked the question: Does anyone use sumstats outside of a cluster context?</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">In my case, the answer is yes, as I perform development on a laptop, and run bro standalone to test new bro policies.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">I found several different bugs in share/bro/base/frameworks/sums<wbr>tats/non-cluster.bro, specifically SumStats::process_epoch_result<wbr>s<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">1. The policy returns sumstats results 50 at a time, and then reschedules itself for the next 50 after 0.01 seconds.. Unfortunately, the reschedule is:<br><span style="font-family:monospace,monospace">schedule 0.01 secs { process_epoch_result(ss, now, data1) };<br></span>instead of:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><span style="font-family:monospace,monospace">schedule 0.01 secs { <b>SumStats::</b>process_epoch_</span><span style="font-family:monospace,monospace">result<wbr>(ss, now, data1) };<br></span>so it silently fails after the first 50 results.  Would be nice to have a warning if a script schedules an event that doesn&#39;t exist.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">2.The serious issue with the policy, though, is that the &#39;for&#39; loop over the result table is the main loop, with up to 50 items processed and deleted within the loop, the expectation being that the iteration will not thus be disturbed.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">The attached program (hash_test.bro) demonstrates that this is not the case (should output 1000, 0, but the 2nd value comes back non-zero), in line with the documented caveat: &quot;Currently, modifying a container’s membership while iterating over it may
result in undefined behavior, so do not add or remove elements
inside the loop.&quot;  I didn&#39;t examine bro source code to appreciate the reason, but surmise that table resizing and rehashing would account for the issue.<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">The consequences of this issue are that, under certain circumstances:</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* Not all data will be returned by SumStats at the epoch</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">* SumStats::finish_epoch may not be run.</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">To address the issue can be done via a rearrangement of the code, along the lines of the following pseudocode (boundary conditions, etc. ignored)<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">original (modifies table inside &#39;for&#39; loop):</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">i=50;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">for (foo in bar)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    {</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    process bar[foo];</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    delete bar[foo];</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    --i;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    if (i == 0) break;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    }</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">to (modifies table outside &#39;for&#39; loop):<br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">i=50;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">while (i &gt;0)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    {</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    for (foo in bar)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">        {</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">        break;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">        }</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    process bar[foo];</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    delete bar[foo]</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    --i;</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">    }</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">...  there are a few other subtleties in the code (keeping a closure on the result table so that sumstats can clear the original table &amp; proceed to the next epoch, and not running SumStats::finish_epoch if the result table was empty to begin with).</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">A bit of rearrangement fixes the bugs while preserving the original behavior, with the help of a wrapper event that checks for an empty result table, and if not makes an explicit copy for further processing by the actual event doing the work.  <br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">An additional &#39;for&#39; loop around the result table could be used to keep it all in one event, but looks too much like black magic (and still, albeit probably in a safe way, depending on undefined behavior) - I prefer clear, understandable code (ha!), rather than &quot;dark corner&quot; features.  Six months later, when I look at the code, I won&#39;t be able to remember the clever trick I was using :-)</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Attached please find hash_test.bro &amp; (patched) non-cluster.bro</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">Jim Mellander</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif">ESNet</div><div class="gmail_default" style="font-family:arial,helvetica,sans-serif"><br></div></div>