[Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system: First-pass broker-enabled Cluster scripting API + misc. (07ad06b)

Wed Nov 1 16:11:11 PDT 2017

> On Nov 1, 2017, at 5:23 PM, Robin Sommer <robin at icir.org> wrote:
> 
> Justin, correct me if I'm wrong, but I don't think this has ever been
> fully fleshed out. If anybody wants to propose something specific, we
> can discuss, otherwise I would suggest we stay with the minimum for
> now that replicates the old system as much as possible and then expand
> on that going forward.

My design for a new cluster layout is multiple data nodes and multiple logger nodes using the new RR and HRW pools Jon added.

It's not too much different from what we have now, just instead of doing things like statically configuring that worker-1,3,5,7 connects
to proxy-1 and worker-2,4,6,8 connect to proxy-2, workers would connect to all data nodes and loggers and use round robin/hashing
for distributing messages.

We have preliminary support for multiple loggers in broctl now, it just uses the static configuration method, so if you are running two
and one process dies, half the workers have no functioning logger.

The node.cfgs would look something like

## Multiple node cluster with redundant data/logger nodes
# manager - 1
[manager-1-logger]
host = manager1
type = logger

[manager-1-data]
host = manager1
type = data
lb_procs = 2

# manager - 2
[manager-2-logger]
host = manager2
type = logger

[manager-2-data]
host = manager2
type = data
lb_procs = 2

# worker 1
[worker-1]
host = workerN
type = worker
lb_procs = 16

...

# worker 4
[worker-4]
host = worker4
type = worker
lb_procs = 16

## 2(or more) node cluster with no SPOF:
# node - 1
[node-1-logger]
host = node1
type = logger

[node-1-data]
host = node1
type = data
lb_procs = 2

[node-1-workers]
host = worker1
type = worker
lb_procs = 16

# node - 2
[node-2-logger]
host = node2
type = logger

[node-2-data]
host = node2
type = data
lb_procs = 2

[node-2-workers]
host = worker2
type = worker
lb_procs = 16

Replicating the old system initially sounds good to me, just as long as that doesn't make it harder to expand things later.

The logger stuff should be the easier thing to change later since scripts don't deal with logger nodes directly and the
distribution would be handled in one place inside the logging framework.  Multiple data nodes is a little harder to add in
later since that requires script language support and script changes for routing events across nodes.

I think for the most part the support for multiple data nodes comes down to 2 functions being required:

- a bif/function for sending an event to a data node based on the hash of a key.
  -  This looks doable now with the HRW code, it's just not wrapped in a single function.

- a bif/function for efficiently broadcasting an event to all other workers (or data nodes)
  -  If the current node is a data node, just send it to all workers
  -  otherwise, round robin the event to a data node and have it send it to all workers minus the current node. 

If &synchronized is going away script writers should be able to broadcast an event to all workers by doing something like

    Cluster::Broadcast(Cluster::WORKERS, event Foo(42));

This would replace a ton of code that currently uses things like worker2manager_events+manager2worker_events+ at if ( Cluster::local_node_type() == Cluster::MANAGER )

— 
Justin Azoff