[Bro-Dev] Broker data store use case and questions

Fri May 11 08:15:53 PDT 2018

Let me clarify point 4, my goal is just to keep the knownhosts data
persistent across restarts.  (Or any data set in the general case.)  So if
HRW is the best way to keep data in memory I need a way to write it out to
disk on Bro exit so I can read it back in later.

On Fri, May 11, 2018 at 9:13 AM Jon Siwek <jsiwek at corelight.com> wrote:

>
>
> On 5/10/18 3:53 PM, Michael Dopheide wrote:
>
> > 1) My initial gut feeling was that all of the when() calls for insertion
> > could get really expensive on a brand new cluster before the store is
> > populated.
>
> I've not tried to explicitly measure differences yet, though my hunch is
> that the overhead of needing to use when() to drive data store
> communication patterns could be slightly more expensive than just using
> remote events (or &synchronized as the previous implementation used).
> I'm thinking of overhead more in terms of memory here, as it needs to
> save the state of the current frame making the call so it can resume
> later.
>
> Another difference is the data store implementation of known-hosts is
> that it does always require remote communication just to check for
> whether a given key is in the data store yet, which may be a bottleneck
> for some use-cases.
>
> You can also compare/contrast with another implementation of
> known-hosts.bro if you toggle the `Known::use_host_store = F` code path.
>   There, instead of using a data store, it sends remote events via
> Cluster::publish_hrw to uniformly partition the data set across proxies
> in a scalable manner but without persistence.
>
> Yet another idea for an implementation, if you need persistence +
> scalability, would be combining the HRW stuff with data stores.  e.g.
> partitioning the total data set across proxies while using a data store
> on each one for local storage instead of a table/set.
>
> I don't know if there's a general answer to which way is best.  Likely
> varies per use-case / network.
>
> > 2) Correct me if I'm wrong, but it seems like the check for a host
> > already being in known_hosts (now host_store) no longer exists.  As a
> > result, we try to re-insert the host, calling when(), every time we see
> > an established connection with a local host.
>
> Sounds right.
>
> Specifically, it's Broker::put_unique() that hides the following:
>
> (1) tell master data store "insert this key if it does not exist"
> (2) wait for master data store to tell us if the key was inserted, and
> thus did not exist before
>
> There's no check against the local cache to first see if the key exists
> as going down that path leads to race conditions.
>
> > 3) How do I retrieve values from the store to test for existence?
>
> Broker::exists() to just check existence of a key or Broker::get() to
> retrieve value at a key.  You can also infer existence from the result
> of Broker::get().
>
> Either requires calling inside 'when()'.  Generally, any function in the
> API you see return a Broker::QueryResult needs to use 'when()'.
>
> > 4) Assuming that requires another Broker call inside a when(), does it
> > make sense to pull the data store into memory at bro_init() and do
> > a Cluster::publish_hrw?
>
> Not sure I follow since, in the current implementation of known-hosts,
> the data store and Cluster::publish_hrw code paths don't interact
> (they're alternate implementations of the same thing as mentioned
> before).  If the question is just whether it makes sense to go the
> Cluster::publish_hrw route instead of using a data store: yes, just
> depends on what you prefer.  IMO, the data store approach has downsides
> that make it less preferable to me.
>
> - Jon
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180511/14031c03/attachment.html