[Bro-Dev] Broker::publish API

Tue Aug 7 03:05:53 PDT 2018

To be honest, I have somehow lost track of the discussion. What I can 
recall, it's about simplifying the API in the light of multi-hop 
routing, which is not fully functional yet.

Regarding multi-hop routing I am even not sure what the actual goal is 
that we are currently aiming at. However, from a conceptual perspective 
I think "routing" either needs routing algorithms or strict conventions 
of how the network, to route messages through, is structured. So, what 
would a "deep cluster" look like and what kind of message flows do we 
expect in there?

Some comments on the observations:

On 06/08/18 21:50, Robin Sommer wrote:
>      - The main topics are bro/cluster/<node-type> and
>        bro/cluster/node/<name>. For these we wouldn't have a problem
>        with loops if we enabled automatic, topic-driven forwading as
>        far as I can see.

How does forwarding work if I add another node type? Do we assume a 
certain cluster structure here? If yes: Is that a valid assumption?

>      - bro/cluster/broadcast seems to be the main case with a looping
>        problem, because everybody subscribes to it. It's hardly used
>        though. (bro/config/change is used similarly though).

The topic-concept is a multicast scheme, isn't it? Having a broadcast 
functionality on top of that feels odd. However, it's limited to the 
cluster topic. This leads me to the question which domains do we operate 
on? If I think of messages, I start to think about a cluster but that 
might be only one domain of application. I think it would be good to 
define layers of abstraction more precise here.

>      - There are a couple of script-specific topics where I'm wondering
>        if these could switch to using bro/cluster/<node-type> instead
>        (bro/intel/*, bro/irc/dcc_transfer_update). In other words: when
>        clusterizing scripts, prefer not to introduce new topics.

 From my understanding this would mean going back to the old 
communication patterns. What's the point of having topics if we don't 
use them?

>      - There's a lot of checks in publishing code of the type "if I am
>        (not) of node type X".

That's something I would have expected. I don't think this is 
necessarily an indicator of bad design. Having these kind of checks 
means that roles are somehow fixed and responsibilities are explicitly 
codified.

>      - Pools are used for two different things: 1. the known-* scripts
>        pick a proxy to process and log the information; whereas 2. the
>        Intel scripts pick a proxy just as a relay to broadcast stuff
>        out, reducing load. That 1st application is a good, but the 2nd
>        feels like should be handled differently.

I think we should be careful about introducing too much abstractions. 
Communication patterns tend to be complex and the more of the complexity 
is hidden, the easier it will be to generate misunderstandings. For 
example, in case of the intel framework, proxy nodes might be able to 
implement some more logic than just relaying at some point. Having the 
relay abstraction would mean to deal with two different levels of 
abstractions regarding intel on proxy nodes in this case.

> Overall I have to say I found it pretty hard to follow this all
> because we don't have much consistency right now in how scripts
> structure their communication. That's not surprising, given that we're
> just starting to use all this, but it suggests that we have room for
> improvement in our abstractions. :)

I totally agree here! I think it could help to come up with some more 
use cases to identify the best abstractions.

Jan