[Bro-Dev] Improving Bro's main loop

Wed Feb 8 22:26:37 PST 2017

Just starting a discussion to take inventory of the current problems with Bro’s main loop and ideas for how to improve it.  Let’s begin with a list of issues (please comment if you have additions):

(1) It uses select(), which is the worst polling mechanism.  It has an upper limit on number of fds that can be polled (some OSs are fixed at 1024), and also scales poorly.  This is problematic for Bro clusters that have many nodes/peers.

2) Integrating new I/O sources isn’t always straightforward from a technical standpoint (e.g. see [1]).  I also found that it’s difficult to understand the ramifications of any change to the run loop without also digging into esoteric details you may not initially think are related (e.g. I usually had to double-check the internals of I/O or threading systems when making any change to the main loop, which may mean there's basic problems with those abstractions).

3) Bro’s time/timers are coupled with I/O.  Time does not move forward unless there is an active I/O source.  This isn’t usually a functional problem for users, but devs occasionally have to hack around it (e.g. unit tests).

I think CAF [2] and/or libuv [3] can address these issues:

1) libuv: abstracts whatever polling mechanism is best for the OS you’re on.  CAF: could allow a more direct actor messaging interface to Broker and since remote communication takes the bulk of fds being polled, the remaining fds (e.g. packet sources, etc.) could be fine to poll in whatever fashion, while the remote communication then is subject to CAF’s own multiplexer.

2) Both libuv and CAF use abstractions/models that are shown to work well.  I think the actor model, by design, does a better job of encouraging systems that are decoupled and therefore scalable.

3) Both libuv and CAF have mechanisms that could implement timers into the run loop such that they’d work independently of other I/O.

libuv may be a quicker, more straightforward path to fixing (1), which is the most critical issue, but it’s also the easiest to fix without aid of a library.  Libuv can also replace other misc. code in Bro like async DNS and signal handling, but, while those may be crufty, they aren’t frequent sources of pain.

Since CAF is a requirement of Broker already and has most potential to improve/replace parts of Bro’s threading system and the way in which Broker is integrated, it may be best in the long-term to explore moving things out of Bro’s current run loop by making them into actors that use message-passing interfaces and then relying on CAF’s own loop.

Any thoughts?

- Jon

[1] http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2015-May/010069.html
[2] https://actor-framework.org/
[3] http://docs.libuv.org/en/v1.x/