[Bro-Dev] 0MQ security considerations

Matthias Vallentin vallentin at icir.org
Thu Jul 28 08:44:47 PDT 2011


> Okay, since this came up and I'm working on it, I guess I'll try to 
> address the architecture issue :)

Thanks for chiming in.

> Originally, we were discussing using 0mq, which uses a message-based 
> architecture.  This struck me as a very clean way to segment a program 
> into threads, and would logically extend rather well to cover other 
> things (e.g. IPC).  As such, I borrowed that model.

I like the message-passing model as well. How do you use the term
"thread?" Do you mean a hardware thread (managed by the OS) or a
virtual/logic thread (a user-space abstraction)? I am asking because
because a general there should be a (close to) 1:1 ratio between
available cores and number of user-level threads, mainly to avoid
thrashing and increase cache performance. With I/O-bound applications
this is of course less of an issue, but nonetheless a prudent software
engineering practice in the manycore era.

> Because there's a large degree of complexity involved with ensuring
> any individual event can be processed on any thread, especially given
> that log / flush / rotate messages have possibly complex ordering
> dependencies to deal with, and further given that a log writer (or, in
> bro's case, most of the logwriter-related events) should spend the
> majority of its time blocking for IO, I don't necessarily agree that
> logging stuff would be a good candidate for task-oriented execution.

You bring up a good point, namely blocking I/O, which I haven't thought
of. Just out of curiosity, could all blocking operations replaced with
their asynchronous counterparts? I am just asking because I am using
Boost Asio in a completely asynchronous fashion. This lends itself well
to a task-based architecture with asynchronous "components," each of
which have a task queue that accumulates non-interfering function calls.
Imagine that each component has some dynamic number of threads,
depending on the available number of cores.

Let's assume there exists an asynchronous component for each log
backend. (Sorry if I am misusing terminology, I'm not completely
up-to-date regarding the logging architecture.) If the log/flush/rotate
messages are encapsulated as a single task, then the ordering issues
would go away, but you still get a "natural" scale-up at event
granularity (assuming events can arrive out-of-order) by assigning more
threads to a component. Does that make sense? Maybe you use this sort of
architecture already?! My point is essentially that there is a
difference between tasks and events that have different notions of
concurrency.

> Re: task-oriented execution for bro in general: this seems like it is
> already accomplished to a large degree by e.g. hardware that splits
> packets by the connection they belong to and routes them to the
> appropriate processing node in the bro cluster.  

Yeah, one can think about it this way. The only thing that gives me
pause is that the term "task" has a very specific, local meaning in
parallel computation lingo.

> If we wanted to see aggregate performance gains, I guess we could
> write ubro: micro scripting language, rules are entirely resident in a
> piece of specialized hardware (CUDA?), processes only certain types of
> packet streams (thus freeing other general-purpose bro instances to
> handle other stuff).

Nice thought, that's something for HILTI where available hardware is
transparently used by the execution environment. For example, if a
hardware regex matcher is available, the execution environment offloads
the relevant instructions to the special card but would otherwise use
its own implementation.

> We could even call it BroCluMP (Bro Cluster Multi-Processing) or
> something.

While we're creating new names, what about Bruda for the GPU-based
version of Bro that offloads regex matching to CUDA? ;-)

    Matthias


More information about the bro-dev mailing list