[Bro-Dev] scheduling events vs using &expire_func ?

Mon Apr 16 08:32:45 PDT 2018

On 4/13/18 6:14 PM, Aashish Sharma wrote:
> I have a aggregation policy where I am trying to keep counts of number of
> connections an IP made in a cluster setup.
> 
> For now, I am using table on workers and manager and using expire_func to
> trigger worker2manager and manager2worker events.
> 
> All works great until tables grow to > 1 million after which expire_functions
> start clogging on manager and slowing down.
> 
> Example of Timer from prof.log on manager:
> 
> 1523636760.591416 Timers: current=57509 max=68053 mem=4942K lag=0.44s
> 1523636943.983521 Timers: current=54653 max=68053 mem=4696K lag=168.39s
> 1523638289.808519 Timers: current=49623 max=68053 mem=4264K lag=1330.82s
> 1523638364.873338 Timers: current=48441 max=68053 mem=4162K lag=60.06s
> 1523638380.344700 Timers: current=50841 max=68053 mem=4369K lag=0.47s
> 
> So Instead of using &expire_func, I can probably try schedule {} ; but I am not
> sure how scheduling events are any different internally then scheduling
> expire_funcs ?

There's a single timer per table that continuously triggers incremental 
iteration over fixed-size chunks of the table, looking for entries to 
expire.  The relevant options that you can tune here:

* `table_expire_interval`
* `table_incremental_step`
* `table_expire_delay`

> I'd like to think/guess that scheduling events is probably less taxing. but
> wanted to check with the greater group on thoughts - esp insights into their
> internal processing queues.

I'm not clear on exactly how your code would be restructured around 
scheduled events, though guessing if you just did one event per entry 
that needs to be expired, it's not going to be better.  You would then 
have one timer per table entry (up from a single timer), or possibly 
more depending on expiration scheme (e.g. if it's expiring on something 
other than create times, you're going to need a way to invalidate 
previously scheduled events).

Ultimately, you'd likely still have the same amount of equivalent 
function calls (whatever work you're doing in &expire_func, would still 
need to happen).  With the way table expiration is implemented, my guess 
is that the actual work required to call and evaluate the &expire_func 
code becomes too great at some point, so maybe first try decreasing 
`table_incremental_step` or reducing the work that you need to do in the 
&expire_func.

With new features in the upcoming broker-enabled cluster framework (soon 
to be merged into git/master), I'd suggest a different way to think 
about structuring the problem: you could Rendezvous Hash the IP 
addresses across proxies, with each one managing expiration in just 
their own table.  In that way, the storage/computation can be uniformly 
distributed and you should be able to simply adjust number of proxies to 
fit the required scale.

- Jon