[Bro] help regarding using bro on application-level byte stream

Jayanth Kannan kjk at eecs.berkeley.edu
Wed Apr 23 13:56:38 PDT 2008

Hi Robin,

Thanks for the quick reply!

Just for more context: What I have is a application-level byte-stream in
both directions which is already re-assembled and sequenced. I would like to
use the trace anonymization (by Ruoming Pang et. al.) which strips out
user-sensitive information from a given trace according to a user-provided
script. I also need to do this in an online fashion.

> > 1. Cook up fake link-layer, TCP,IP headers, and feed Bro via a FIFO.
> That seems to be the easiest option for an implementation as you
> wouldn't need to dive into Bro but could write the conversion
> completely externally. Also, with tools like tcpdump etc. you could
> quickly see if things look like they're supposed to. However, I'm
> not sure I fully understand in which format your input is in
> exactly, so not sure how easy it would be to turn it into fake
> packets (e.g., is it already reassembeled or still packetized?).

Well, actually cooking up the fake headers should be simple, since my data
stream is already reassembled, and only needs to wrapped up in the
appropriate TCP and IP headers, along with some fake SYNs, SYNACKs, and
FINs. I didn't really like the idea of cooking up fake stuff, since I don't
really want Bro to do analysis on these fake headers. But, as you say, this
is probably the simplest option for me.

> > 2. Use Brocolli to send really low-level events (events being "so and so
> > bytes seen on so and so conn").
> Won't really work because Bro doesn't have any events which are so
> low-level. All its events are coming out of the packet/payload
> analysis, they aren't any which provide input for it. (You could add
> some of your on to feed your data into Bro protocol processing via
> Broccoli but that wouldn't be too different from faking packets as
> in (1).)

Oh, I see.

> > 3. Use the Bro source code directly, and somehow instantiate an analyzer
> > directly on the byte-stream. Any state needed (such as connection
> endpoints)
> > have to be cooked up.
> That's an interesting thought. I don't have an immediate opinion on
> how difficult this would be. My guess is that you'd quickly be
> running into lots of subtle problems with lacking the state you need
> to keep the analysis going and which is hard to cook up. That said,
> if you're game to dive into Bro's internals for such a solution, you
> could just give it a try. However, I wouldn't spend too much time on
> it if it turns out to get problematic (and again at lot of this
> depends on how *exactly* your input looks like).

Oh, I see. I have been nosing around the source code to figure this out, and
the new DPD framework seems fairly subtle to get right. As you say, I will
probably do this for some more time, and then go to the fake header option.

> One other thought: which applications are you interested in? If it's
> only a few and there happen to be binpac analyzers for them, you
> could write a standalone program feeding your data into these binpac
> analyzers.

Well, I would like it to be as general as possible (since the
application-level stream is coming from a decrypted SSL connection, which
may be in use by any application), which is why I thought of leveraging
Bro's broad support rather than BinPac support. Also, the anonymization
script (by Pang et al) relies on the event processing of Bro, and so again,
I need to run the trace through Bro to get those events.

> Final note: you mention that you want to rewrite the content: I'm
> not very familiar with that part of Bro but I'm guessing it also has
> quite a few dependencies on having packets as input.

Yes, Pang's scripts  maintain a lot of application-level state in doing the
anonymization, which is why I need to run them through Bro.

Once again, thanks for the quick reply.

