[Bro] File Scanning Capability

Will baxterw3232 at gmail.com
Mon Mar 21 12:44:01 PDT 2011

On Mon, Mar 21, 2011 at 2:49 PM, Seth Hall <seth at icir.org> wrote:

> On Mar 21, 2011, at 2:16 PM, Will wrote:
> > I will without a doubt eventually incorporate
> "http-ext-identified-files.sig" instead of what I am currently using, but I
> am having trouble determining where to integrate the logic for handling each
> file type. As it currently works, I am saving off every pdf and word doc,
> which would be unnecessary if I used bro to call the external tools and
> evaluate the results.
> >>That won't actually work quite right.  The http-ext-identified-files.sig
> file uses special signature keywords that the http analyzer >>provides to
> detect file types.  It's not directly applicable to SMTP/MIME transfers.
> Understandable. Being that there are so many different types it would be
beneficial enough to create a signature file for SMTP/MIME. I would be happy
to share it when I get it done.

> > Current logic (this method calls for the external tools to be run against
> the directory by cron and are independent of Bro):
> >         hot_attachment_dump_fh = open( hot_attachment_dumpname );
> >         write_file(hot_attachment_dump_fh, data);
> >         close(hot_attachment_dump_fh);
> >>In what event are you currently running using this code?

Here is the entire event:

event mime_entity_data(c: connection, length: count, data: string)
       local session = get_session(c, T);

       #md5 hashing is now a builtin function, so just call it and dumpthe
result into the content_hash field
       #that field in the info struct was already there, just had to add
this to fill it.
       session$content_hash = md5_hash(data);

       #log the first 256 bytes of the attachment and the MD5 hash.
       mime_log_msg(session, "data", fmt("%d: %s", length, sub_bytes(data,
0, 256)));
       mime_log_msg(session, "all data", fmt("MD5: %s",

       #if the hot flag is set then we dump the MIME-decoded attachment to
it's own file for analysis
       if( session$entity_is_hot )
        if ( session$entity_filename == hot_pdf_attachment_filenames )
             #build the filename out of MD5, length and filename
             hot_attachment_dumpname = fmt("dumped_pdf_files\/%s:%d:%s",
session$content_hash, length, session$entity_filename);
        if ( session$entity_filename == hot_word_attachment_filenames )
             hot_attachment_dumpname = fmt("dumped_doc_files\/%s:%d:%s",
session$content_hash, length,session$entity_filename);

        #get a raw filehandle, notice open() instead of open_log_file(),
write the data out, and be sure to close the fh
        hot_attachment_dump_fh = open( hot_attachment_dumpname );
        write_file(hot_attachment_dump_fh, data);

        #log stuff to the hot logfile as well
             mime_log_hot_msg(session, "hot data", fmt("%d: %s", length,
sub_bytes(data, 0, 256)));
        mime_log_hot_msg(session, "hot data", fmt("File dumped: %s MD5: %s",
session$entity_filename, session$content_hash));

I attached the modifed mime.bro in case anyone wanted to see the how the
rest of it.

> The scan for office docs would be similiar, but use 'OfficeMalScanner'
> instead of pdfid.py and pdf-parser.py. If I get this to work, I would like
> to do something very similar with http files.
> Makes sense.
> > How can I call the external tools?  Is this the right place to be doing
> this?
> You can't currently do this in a way that would be feasible on live
> traffic.  The problem is that the call to the external tool would block Bro
> and cause it to start dropping packets.  There is a "when" statement that
> can help build asynchronous function calls though.  So that the stack state
> will be saved and used again when the function call returns.  I don't know
> if the system() (I think this is what you're looking for to run external
> programs) function can be used with the when statement though.

I suppose the short answer is yes. I was looking for something like the
system() call. Like modifying the PyBroccoli Example from below:
PyBroccoli Example:
def pong(src_time, dst_time):
    print "pong event: time=%f/%f s" % \
       (dst_time - src_time, current_time() - src_time)
bc = Connection("")
bc.send("ping", time(current_time()))


@event (event == dumped pdf file)
def pass_pdf(file):
      system(pdf_scan.py -f dumped_file.pdf > tempdir)

With what you mentioned taken into account, we can't ask bro to wait on the
results, but maybe we could dump the results to a logfile for alerting?

> If you are looking to run this on tracefiles for now though, you can
> certainly just use the system function to call your external tool.  It takes
> a single argument (a string) that is the command line you'd like to run.
>  There is a function for defanging data if you need to do that too (taking
> something off the line and using it in the command line) named
> str_shell_escape.  You do need to make sure that the data that is defanged
> with str_shell_escape is placed within double-quotes.
> > I would be surprised if this capability doesn't already exist and suppose
> I might be going about this all wrong. I would just prefer to incorporate
> the file scans in Bro vice running them completely independently. If I
> wasn't clear or am completely out in left field feel free to be honest. I
> won't be offended.
> Nope, not out in left field at all and personally I'm a bit ashamed I never
> wrote a mime-ext.bro script that was a bit more capable like the http-ext
> script.  I'm going to be rewriting the mime.bro script for the next release
> though and it will definitely have file extraction and identification
> capabilities built into it.  However, we are going to be working toward a
> much more generalized notion of files for some future release of Bro.  I've
> worked a bit on how that may proceed, but unfortunately we definitely won't
> be anywhere close to ready with that for the next release.

Maybe you should charge "more" for Bro...

No, you all are doing a great job on this project. I just wish I could do
more to help.

>  .Seth
> --
> Seth Hall
> International Computer Science Institute
> (Bro) because everyone has a network
> http://www.bro-ids.org/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20110321/24f6c184/attachment.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mime.bro
Type: application/octet-stream
Size: 11934 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20110321/24f6c184/attachment.obj 

More information about the Bro mailing list