[Bro-Dev] design summary: porting Bro scripts to use Broker

Fri Oct 6 09:53:26 PDT 2017

I want to check if there’s any feedback on the approach I’m planning to take when porting over Bro’s scripts to use Broker.  There’s two major areas to consider: (1) how users specify network topology e.g. either for traditional cluster configuration or manually connecting Bro instances and (2) replacing &synchronized with Broker’s distributed data storage features.

Broker-Based Topology
=====================

It’s again useful to decompose topology specification it into two main use-cases:

Creating Clusters, e.g w/ BroControl
------------------------------------

This use-case should look familiar once ported to use Broker: the existing “cluster” framework will be used for specifying the topology of the cluster and for automatically setting up the connections between nodes.  The one thing that will differ is the event subscription mechanism, which needs to change since Broker itself handles that differently, but I think the general idea can remain similar.

The current mechanism for handling event subscription/publication:

	const Cluster::manager2worker_events = /Drop::.*/ &redef;
	# similar patterns follow for all node combinations...

And a script author that is making their script usable on clusters writes:

	redef Cluster::manager2worker_events += /^Intel::(cluster_new_item|purge_item)$/;

The new mechanism:

	# contains topic prefixes
	const Cluster::manager_subscriptions: set[string] &redef;

	# contains (topic string, event name) pairs
	const Cluster::manager_publications: set[string, string] &redef;

	# similar sets follow for all node types…

And a script author writes:

	# topic naming convention relates to file hierarchy/organization of scripts
	redef Cluster::manager_subscriptions += {
		"bro/event/framework/control",
		"bro/event/framework/intel",
	};

	# not sure how to get around referencing events via strings: can't use 'any'
	# to stick event values directly into the set, maybe that’s ok since we can
	# at least detect lookup failures at bro_init time and emit errors/abort.
	redef Cluster::manager_publications += {
		["bro/event/framework/control/configuration_update_request",
		 "Control::configuration_update_request"],
		["bro/event/framework/intel/cluster_new_item",
		 "Intel::cluster_new_item"],
	};

Then subscriptions and auto-publications still get automatically set up by the cluster framework in bro_init().

Other Manual/Custom Topologies
------------------------------

I don’t see anything to do here as the Broker API already has enough to set up peerings and subscriptions in arbitrary ways.  The old “communication” framework scripts can just go away as most of its functions have direct corollaries in the new “broker” framework.

The one thing that is missing is the “Communication::nodes” table which acts as both a state-tracking structure and an API that users may use to have the comm. framework automatically set up connections between the nodes in the table.  I find this redundant — there’s two APIs to accomplish the same thing, with the table being an additional layer of indirection to the actual connect/listen functions a user can just as easily use themselves.  I also think it’s not useful for state-tracking as a user operating at the level of this use-case is can easily track nodes themselves or has some other notion of the state structures they need to track that is more intuitive for the particular problem they're solving.  Unless there’s arguments or I find it’s actually needed, I don’t plan to port this to Broker.

Broker-Based Data Distribution
==============================

Replacing &synchronized requires completely new APIs that script authors can easily use to work for both cluster and non-cluster use-cases and independently of a user’s choice of persistent storage backend.

Broker Framework API
--------------------

const Broker::default_master_node = "manager" &redef;

const Broker::default_backend = MEMORY &redef;

# Setting a default dir will, for persistent backends that have not
# been given an explicit file path, automatically create a path within this
# dir that is based on the name of the data store.
const Broker::default_store_dir = "" &redef;

type Broker::StoreInfo: record {
  name: string &optional;
  store: opaque of Broker::Store &optional;
  master_node: string &default=Broker::default_master_node;
  master: bool &default=F;
  backend: Broker::BackendType &default=default_backend;
  options: Broker::BackendOptions &default=Broker::BackendOptions();
};

# Primarily used by users to set up master store location and backend
# configuration, but also possible to lookup an existing/open store by name.
global Broker::stores: table[string] of StoreInfo &default=StoreInfo() &redef;

# Set up data stores to properly function regardless of whether user is
# operating a cluster.  This also automatically sets up the store to
# be a clone or a master as is appropriate for the the local node type.
# It does this by inspecting the state of the “Broker::stores” table,
# which a user configures in advance via redef.
# (I have pseudo-code written, let me know if you want to see it all).
global Broker::InitStore: function(name: string): opaque of Broker::Store;

Script-Author Example Usage
---------------------------

# Script author that wants to utilize data stores doesn't have to be aware of
# whether user is running a cluster or if they want to use persistent storage
# backends.

const Software::tracked_store_name = "bro/framework/software/tracked" &redef;

global Software::tracked_store: opaque of Broker::Store;

event bro_init() &priority = +10
  {
  Software::tracked_store = Broker::InitStore(Software::tracked_store_name);
  }

Bro-User Example Usage
----------------------

# User needs to be able to choose data store backends and which cluster node the
# the master store lives on.  They can either do this manually, or BroControl
# will autogenerate the following in cluster-layout.bro:

# Explicitly configure an individual store.
redef Broker::stores += {
  ["bro/framework/software/tracked"] = [$master_node = "some_node",
                                        $backend=Broker::SQLITE,
                                        $options=Broker::BackendOptions(
                                          $sqlite=Broker::SQLiteOptions(
                                            $path="/home/jon/tracked_software.sqlite"))];
};

# Or set new default configurations for stores.
redef Broker::default_master_node = "manager";
redef Broker::default_backend = Broker::MEMORY;
redef Broker::default_store_dir = "/home/jon/stores";

# Then Broker::InitStore() will end up creating the right type of store.

BroControl Example Usage
------------------------

BroControl users will have a new “datastore.cfg" file they may customize:

# The default file will contain a just a basic [default] section
# and would set up all data stores on the manager node, using the default
# backend (in-memory).  If a user wants to globally change to persistent
# storage and also give a canonical storage node, they can do that here.

[default]
master = manager
backend = MEMORY
# When using persistent backends as default, need to specify a directory to
# store databases in.  Files will be auto-named based on the store's name.
dir = /home/jon/stores

# If a user has special needs regarding persistence/residence, they can
# further customize individual stores:
[bro/framework/software/tracked]
master = some_node
backend = SQLITE
path = /home/jon/tracked_software.sqlite