[Zeek-Dev] Sumstats undocumented feature - changing the epoch

Jim Mellander jmellander at lbl.gov
Mon Apr 29 11:43:53 PDT 2019

I've found it convenient to use an undocumented feature of Sumstats:
changing the epoch.  This comes particularly handy when creating statistics
for human consumption, as oftentimes it is useful to synchronize to a
logging interval.  For example, if hourly stats are desired, it is useful
to have a shorter epoch for the original sumstats to align with an hour,
then to have subsequent sumstats trigger on the hour.

Researching into this, I realized that the epoch variable can be changed,
if the argument to *Sumstats::create* is a variable, rather than the usual
style of an anonymous argument.  Then, in *epoch_result*, or
*epoch_finished*, the timeout for the next epoch can be recomputed on the
fly using *calc_next_rotate()*.

However, this fails to work as expected as the next sumstat is scheduled
prior to executing *epoch_result*, and *epoch_finished*.  What does work is
the following hack:

   1. Create the initial sumstat with a epoch that will synchronize to the
   logging interval
   2. Immediately change the epoch to the desired interval


*event bro_init()*
*    {*

*    # So network_time() will be initialized...*
*    schedule 0 usec { setup_sumstat() };*

*    }*
*event setup_sumstat()*
*    {*
*    ... blah ...*
*    local mysumstat: SumStats::SumStat;*
*    mysumstat = [*
*        $name="mysumstat",*
*        $epoch=calc_next_rotate(10 min) - network_time(),*
*        etc...*
*    ];*
*    SumStats::create(mysumstat);*

*    # Now SumStat has been created, and the initial epoch scheduled,
change epoch to regular interval for the future*
*    mysumstat$epoch = 10 min;*
*    }*

It would be convenient if the epoch could be changed in *epoch_result* or
*epoch_finished*, but some internals would require a bit of change - the
reschedule would need to take place after processing results, which could
throw the timing off a bit - on the other hand, unless one is interested in
exact statistics over a known time period (as I am), the small amount of
jitter probably wouldn't be noticeable or significant.

The above is horribly hackish, and a different approach for accomplishing
the goal would be to allow use scripts to schedule the end of the epoch:

   1. Mark *epoch* as *&optional*.
2. Expose and document *SumStats::finish_epoch* as part of the public API
   3. Make the minor changes to not schedule *SumStats::finish_epoch* if
   *epoch* is undefined.

By not defining *epoch* a script would indicate that it will manage epoch
timing. The script would schedule the first epoch based on the logging
interval, and in the *epoch_finished*  function schedule each successive
epoch to stay in sync with the logging interval.

Any comments, suggestions, etc. ????

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190429/5deca004/attachment.html 

More information about the zeek-dev mailing list