[Bro-Dev] 'for loop' variable modification

Mon Jan 8 22:27:29 PST 2018

I got the following idea while perusing non_cluster.bro
SumStats::process_epoch_result

i=1;
while (i <= 1000 && |bar| > 0)
    {
    for (foo in bar)
        {
        break;
        }
    ...
    process bar[foo]
    ...
    optional: baz[foo] = bar[foo] #If we need to preserve original data
    delete bar[foo];
    ++i;
    }

This will allow iteration thru the table as I originally desired, although
destroying the original table.

SumStats::process_epoch_result deletes the current item inside the for
loop, so is relying on undefined behavior, per the documentation:
"Currently, modifying a container’s membership while iterating over it may
result in undefined behavior, so do not add or remove elements inside the
loop."  The above example avoids that.  Does anyone use sumstats outside of
a cluster context?

On Fri, Jan 5, 2018 at 6:04 PM, Jim Mellander <jmellander at lbl.gov> wrote:

> Thanks, Jon:
>
> I've decided to split the data (a table of IP addresses with statistics
> captured over a time period) based on a modulo calculation against the IP
> address (the important characteristic being that it can be done on the fly
> without an additional pass thru the table), which with an average
> distribution of traffic gives relatively equal size buckets, each of which
> can be processed during a single event, as I described.
>
> I like the idea of co-routines - it would help to address issues like
> these in a more natural manner.
>
> Jim
>
>
>
>
>
>
>
> On Fri, Jan 5, 2018 at 5:28 PM, Jon Siwek <jsiwek at corelight.com> wrote:
>
>> On Fri, Jan 5, 2018 at 2:19 PM, Jim Mellander <jmellander at lbl.gov> wrote:
>>
>> > I haven't checked whether my desired behavior works, but since its not
>> > documented, I wouldn't want to rely on it in any event.
>>
>> Yeah, I doubt the example you gave currently works -- it would just
>> change the local value in the frame without modifying the internal
>> iterator.
>>
>> > I would be interested in hearing comments or suggestions on this issue.
>>
>> What you want, the ability to split the processing of large data
>> tables/sets over time, makes sense.  I've probably also run into at
>> least a couple cases where I've been concerned about how long it would
>> take to iterate over a set/table and process all keys in one go.  The
>> approach that comes to mind for doing that would be adding coroutines.
>> Robin has some ongoing work with adding better support for async
>> function calls, and I wonder if the way that's done would make it
>> pretty simple to add general coroutine support as well.  E.g. stuff
>> could look like:
>>
>> event process_stuff()
>>     {
>>     local num_processed = 0;
>>
>>     for ( local item in foo )
>>         {
>>         process_item(item);
>>
>>         if ( ++num_processed % 1000 == 0 )
>>             yield;  # resume next time events get drained (e.g. next
>> packet)
>>         }
>>
>> There could also be other types of yield instructions, like "yield 1
>> second" or "yield wait_for_my_signal()" which would, respectively,
>> resume after arbitrary amount of time or a custom function says it
>> should.
>>
>> - Jon
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180108/8b6e99a5/attachment.html