[Bro-Dev] Writing SumStats plugin

Jim Mellander jmellander at lbl.gov
Tue Aug 7 15:15:27 PDT 2018


It seems that there's some inconsistency in SumStats plugin usage and
implementation.  There appear to be 2 classes of plugins with differing
calling mechanisms and action:

   1. Item to be measured is in the Key, and the measurement is in
   Observation
      1. These include Average, Last X Observations, Max, Min, Sample,
      Standard Deviation, Sum, Unique, Variance
         1. These are exact measurements.
         2. Some of these have dependencies: StdDev depends on Variance,
         which depends on Average
         2. Item to be measured is in Observation, and the measurement is
   implicitly 1, and the Key is generally null
   1. These include HyperLogLog (number of Unique), TopK (top count)
      1. These are probabilistic data structures.

The Key is not passed to the plugin, but is used to allocate a table that
includes, among other things, the processed observations.  Both classes
call the epoch_result function once per key at the end of the epoch.  Since
class 2 plugins often use a null key, there is only one call to
epoch_result, and a special function is used to extract the results from
the probabilistic data structure (
https://www.bro.org/current/exercises/sumstats/sumstats-5.bro).  The
epoch_finished function is called when all keys have been returned to
finish up.  This is unneeded with this sort of class 2 plugin, since all
the work can be done in the sole call to epoch_result.  Multiple keys could
be used with class 2 plugins, which allows for groupings (
https://www.bro.org/current/exercises/sumstats/sumstats-4.bro).

I have a use case where I want to pass both a key and measurement to a
plugin maintaining a probabilistic data store [1].  I don't want to
allocate a table for each key, since many/most will not be reflected in the
final results.  Since the Observation is a record containing both a string
& a number, a hack would be to coerce the key to a string, and pass both in
the Observation to a class 2 plugin, with a null key - which is what I am
doing currently.

It would be nice to have a conversation on how to unify these two classes
of plugins.  A few thoughts on this:

   - Pass Key to the plugins - maybe Key could be added to the Observation
   structure.
   - Provide a mechanism to *not* allocate the table structure with every
   new Key (this and the previous can possibly be done with some hackiness
   with the normalize_key function in the reducer record)
   - Some sort of epoch_result factory function that by default just
   performs the class 1 plugin behavior.  For class 2 plugins, the function
   would feed the results one by one into epoch_result.

Incidentally, I think theres a bug in the observe() function:

These two lines are run in the loop thru the reducers:
               if ( r?$normalize_key )
                        key = r$normalize_key(copy(key));
which has the effect of modifying the key for subsequent loops, rather than
just for the one reducer it applies to.  The fix is easy and and obvious....

Jim


[1] Implementation of algorithms 4&5 (with enhancements) of
https://arxiv.org/pdf/1705.07001.pdf



On Thu, Aug 2, 2018 at 4:44 PM, Jim Mellander <jmellander at lbl.gov> wrote:

> Hi all:
>
> I'm thinking of writing a SumStats plugin, probably with the initial
> implementation in bro scriptland, with a re-implementation as BIFs if
> initial tests successful.
>
> From examining several plugins, it appears that I need to:
>
>    - Add NAME of my plugin as an enum to Calculation
>    - Add optional tunables to Reducer
>    - Add my data structure to ResultVal
>    - In register_observe_plugins, register the function to take an
>    observation.
>    - In init_result_val_hook, add code to initialize data structure.
>    - In compose_resultvals_hook, add code to merge multiple data
>    structures
>    - Create function to extract
>    from data structure either at epoch_result, or epoch_finished
>
> Any thing else I should be aware of?
>
> Thanks in advance,
>
> Jim
>
>
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180807/88a2cedd/attachment.html 


More information about the bro-dev mailing list