[Bro] Bro's escaping of non-printable characters behaves unexpected

Paul Pearce pearce at cs.berkeley.edu
Tue Feb 17 21:15:08 PST 2015

Hey Johanna,

Thanks for taking the time to respond.

> I think the reason that the ascii writer of the logging framework of Bro
> does not support arbitrary binary data is, that it was conceived as a
> framework for writing human-readable log files, not arbitrary binary data.

I'm going to push back a bit on characterizing this as supporting
arbitrary binary data. These are unicode characters appearing in URIs
($http$URI) that I'm encountering in actual network traffic. I'm
actually encountering them somewhat frequently. The problem manifests
itself in the standard http.log, as well as the extensions I'm working

I realize the RFC does not permit unicode in URLs, but given that they
do occur in practice (browsers will just silently handle them), this
seems like something worth supporting.

I'll also point out that Bro's ascii logging facilities do currently
support logging these characters, they simply do so in an
unrecoverable/non-canonical way. What I'm proposing is
standardization/cleanup for the escaping that Bro already performs.


More information about the Bro mailing list