[Bro-Dev] 0MQ security considerations

Gilbert Clark gc355804 at ohio.edu
Thu Jul 28 10:50:42 PDT 2011



>> Originally, we were discussing using 0mq, which uses a message-based
>> architecture.  This struck me as a very clean way to segment a program
>> into threads, and would logically extend rather well to cover other
>> things (e.g. IPC).  As such, I borrowed that model.
> I like the message-passing model as well. How do you use the term
> "thread?" Do you mean a hardware thread (managed by the OS) or a
> virtual/logic thread (a user-space abstraction)? I am asking because
> because a general there should be a (close to) 1:1 ratio between
> available cores and number of user-level threads, mainly to avoid
> thrashing and increase cache performance. With I/O-bound applications
> this is of course less of an issue, but nonetheless a prudent software
> engineering practice in the manycore era.

Whichever model pthread happens to use :)  I think the implementation 
might be slightly platform-dependent.

That said, keep in mind that some libraries (e.g. DataSeries) actually 
spawn additional worker threads.  This makes it very difficult to place 
a hard limit on the number of threads that exist within a bro process.  
If this does turn out to be an issue, it may be time to look at 
fork()'ing the loggers and / or using some kind of remote logging scheme.

>> Because there's a large degree of complexity involved with ensuring
>> any individual event can be processed on any thread, especially given
>> that log / flush / rotate messages have possibly complex ordering
>> dependencies to deal with, and further given that a log writer (or, in
>> bro's case, most of the logwriter-related events) should spend the
>> majority of its time blocking for IO, I don't necessarily agree that
>> logging stuff would be a good candidate for task-oriented execution.
> You bring up a good point, namely blocking I/O, which I haven't thought
> of. Just out of curiosity, could all blocking operations replaced with
> their asynchronous counterparts?

For the ASCII and the DataSeries loggers, yes; the latter seems to 
already do this (one of the worker threads is an output thread).

> I am just asking because I am using
> Boost Asio in a completely asynchronous fashion. This lends itself well
> to a task-based architecture with asynchronous "components," each of
> which have a task queue that accumulates non-interfering function calls.
> Imagine that each component has some dynamic number of threads,
> depending on the available number of cores.


> Let's assume there exists an asynchronous component for each log
> backend. (Sorry if I am misusing terminology, I'm not completely
> up-to-date regarding the logging architecture.)

I think I follow you.

>   If the log/flush/rotate
> messages are encapsulated as a single task, then the ordering issues
> would go away, but you still get a "natural" scale-up at event
> granularity (assuming events can arrive out-of-order) by assigning more
> threads to a component.

I don't understand this bit.  Say we have 10 things to log: {A, B, C, 
... J}, and we queue 8 messages before we flush + rotate (for the sake 
of argument)

Log+Flush+Rotate task = {A, B, C, D}      ---> Thread 1
Log+Flush task = {E, F, G, H}   ---> Thread 2
Log+Flush+Finish task = {I, J}   ---> Thread 3

In this case, it seems possible that thread 3 could complete before 
threads 1 and 2, unless we forced tasks to lock the log file upon 
execution. . . but if we did that, I think the locking order would 
become less predictable as the system's load increased, leading to oddly 
placed chunks of log data and / or dropped messages if the file were to 
be closed before a previous rotate message made it to the logger.

> Does that make sense? Maybe you use this sort of
> architecture already?!

Kind of.  Let me throw together a diagram and send it out; it's 
something that should probably end up in the documentation anyway :)

> My point is essentially that there is a
> difference between tasks and events that have different notions of
> concurrency.

I need to do some more reading before I'd be prepared to offer a 
coherent argument here.

>> Re: task-oriented execution for bro in general: this seems like it is
>> already accomplished to a large degree by e.g. hardware that splits
>> packets by the connection they belong to and routes them to the
>> appropriate processing node in the bro cluster.
> Yeah, one can think about it this way. The only thing that gives me
> pause is that the term "task" has a very specific, local meaning in
> parallel computation lingo.

I don't claim to be a parallel computation expert, so this does not 
surprise me :)

I was more attempting to illustrate the parallel between bro's existing 
cluster architecture and a more traditional task-oriented model (as I 
understand it) than I was trying to argue that packets should 
necessarily be classifiable as "tasks" in the strictest sense of the word.

>> If we wanted to see aggregate performance gains, I guess we could
>> write ubro: micro scripting language, rules are entirely resident in a
>> piece of specialized hardware (CUDA?), processes only certain types of
>> packet streams (thus freeing other general-purpose bro instances to
>> handle other stuff).
> Nice thought, that's something for HILTI where available hardware is
> transparently used by the execution environment. For example, if a
> hardware regex matcher is available, the execution environment offloads
> the relevant instructions to the special card but would otherwise use
> its own implementation.

Ah, right then.


More information about the bro-dev mailing list