[Bro-Dev] [Bro] effects of &synchronized and &mergeable

Robin Sommer robin at icir.org
Thu Jan 17 08:54:04 PST 2013



On Wed, Jan 16, 2013 at 17:09 -0800, you wrote:

> I take it your view is that we now have enough experiences with clusters
> to conclude that we aren't making full use of the generality, so we should
> consider the maintenance/complexity gains we could achieve by removing it.

While it's a general mechanism, it comes with its own limitations, in
particular there's no control with whom to synchronize; it's everybody
or nobody. That could be solved in principle but only at the expense
of further complexity.

But the real answer is: we aren't making use of &synchronized much
already:

    > grep -R '\&synchronized' scripts/
    scripts/policy/protocols/conn/known-hosts.bro:  global known_hosts: set[addr] &create_expire=1day &synchronized &redef;
    scripts/policy/protocols/conn/known-services.bro:       global known_services: set[addr, port] &create_expire=1day &synchronized;
    scripts/policy/protocols/ssl/known-certs.bro:   global certs: set[addr, string] &create_expire=1day &synchronized &redef;
    scripts/policy/protocols/ssl/validate-certs.bro:                &read_expire=5mins &synchronized &redef;
    scripts/policy/protocols/ssh/detect-bruteforcing.bro:           &read_expire=guessing_timeout+1hr &synchronized &redef;
    scripts/base/frameworks/software/main.bro:              &synchronized

(Note that all but one are in the optional "policy" set).

In other words, we are already implementing cluster synchronization
with events, not &synchronized.

There's a conceptual change with 2.0 that makes &synchronized less
useful. Originally the attribute was meant for the user: by simply
attaching &synchronized to a table, things get taken care of. The new
2.0 frameworks however work at a higher level, with their own APIs
already hiding clusterization transparently internally. With that, the
focus is shifting from what helps the user to what helps the
frameworks.

That along with the just "best effort" semantics of &synchronized and
its internal complexity leaves me wondering if the better long-term
strategy is something else.

> What about for non-cluster distributed deployments?  As I understand it,
> LBL's "Deep Bro" vision is to coordinate Bros that are analyzing different
> traffic streams

That's exactly where the current &synchronized becomes hard to use
because you can't select what state to exchange between which parts of
the deep-bro setup; the one-set-of-state-for-all doesn't really apply
anymore there.

> One thing I'm wondering is whether that use-case might still benefit
> from more general semantics.

I'm thinking to take out some of the generality that &synchronized
provides, but in return add some new flexibility/capabilites that we
currently don't have (better semantics, sharing of subsets of state,
persistence that's closely tied in).

Here's some further thoughts (mine; don't know if this aligns with what
Seth wants ...)

I like the idea of having a transparent key-value store that's both
distributed and persistent. Scripts get an API to insert/delete value
indexed by strings and Bro guarentees that it will show up everywhere
(we might even be able to do some strict form of global consistency
here; not sure). The master node keeps a persistent copy on disk that
survives restarts. Other frameworks can then use this new API to
distribute/store state.

Actually it wouldn't be a single key-value store but scripts should be
able to create new, separate ones on demand. And they can specify with
which nodes to sync each with; or maybe other nodes could subscribe to
individual stores by their name. Maybe lets call the stores "views".
For example, in a tiered deep-cluster, a set of nodes monitoring a
subnet could use their own view that's not propagated to those for
other subnets (and we could extend that mechanism to events to share
them more selectively as well).

> Here do you mean essentially do explicit synchronization rather than
> implicit?

Yes, in terms of mechanism. However for most users it would still be
transparent as long as they use the standard frameworks. And if they
don't, they'd at least get a very intuitive/familiar key-value data
model.

Just brainstorming,

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


More information about the bro-dev mailing list