[Bro] File Scanning Capability

Jim Mellander jmellander at lbl.gov
Mon Mar 21 13:05:18 PDT 2011

Hi Will:

Seems like you would probably want to use the python broccoli bindings
to send an event to a python client, here's what I'm doing with my
"stomper" code, which looks up urls on the fly in a malware database:

# In your bro startup script
@load listen-clear

redef Remote::destinations += {
        ["remote_stomper"] = [ $host=, $events = /remote_check_URL/,
 $connect=F, $ssl=F ]

#within bro policy

# Here we send to the broccoli client for checking/processing
event remote_check_URL(++stomper_seqno, c, is_orig, host, uri, ts);


On the python side, the relevant sections from the python code, which
is running as a daemon accepting events from bro and acting on them:

#! /usr/bin/env python

import broccoli
import sqlite3
import random
import sys
import re
import select   # for select loop

# Bro event loop
def bro_event_loop(bro_conn):
        while True:
        while True:

def remote_check_URL(seqno, host, uri):
    # Receive a URL from bro, and send a return signal back
    #  if it should be blocked.
    category = check_database(host,uri)
    if category:
        if check_category(category):
            # If the category signals a block

#Main program - Initialize and call event loop

# Setup the connection to bro
bro_conn = broccoli.Connection("")

# Event loop
# Everything under this is never executed.

Hope this will help you kick the can down the road a bit....

On Mon, Mar 21, 2011 at 12:44 PM, Will <baxterw3232 at gmail.com> wrote:
> On Mon, Mar 21, 2011 at 2:49 PM, Seth Hall <seth at icir.org> wrote:
>> On Mar 21, 2011, at 2:16 PM, Will wrote:
>> > I will without a doubt eventually incorporate
>> > "http-ext-identified-files.sig" instead of what I am currently using, but I
>> > am having trouble determining where to integrate the logic for handling each
>> > file type. As it currently works, I am saving off every pdf and word doc,
>> > which would be unnecessary if I used bro to call the external tools and
>> > evaluate the results.
>> >>That won't actually work quite right.  The http-ext-identified-files.sig
>> >> file uses special signature keywords that the http analyzer >>provides to
>> >> detect file types.  It's not directly applicable to SMTP/MIME transfers.
> Understandable. Being that there are so many different types it would be
> beneficial enough to create a signature file for SMTP/MIME. I would be happy
> to share it when I get it done.
>> > Current logic (this method calls for the external tools to be run
>> > against the directory by cron and are independent of Bro):
>> >         hot_attachment_dump_fh = open( hot_attachment_dumpname );
>> >         write_file(hot_attachment_dump_fh, data);
>> >         close(hot_attachment_dump_fh);
>> >>In what event are you currently running using this code?
> Here is the entire event:
> event mime_entity_data(c: connection, length: count, data: string)
>        {
>        local session = get_session(c, T);
>        #md5 hashing is now a builtin function, so just call it and dumpthe
> result into the content_hash field
>        #that field in the info struct was already there, just had to add
> this to fill it.
>        session$content_hash = md5_hash(data);
>        #log the first 256 bytes of the attachment and the MD5 hash.
>        mime_log_msg(session, "data", fmt("%d: %s", length, sub_bytes(data,
> 0, 256)));
>        mime_log_msg(session, "all data", fmt("MD5: %s",
> session$content_hash));
>        #if the hot flag is set then we dump the MIME-decoded attachment to
> it's own file for analysis
>        if( session$entity_is_hot )
>         {
>         if ( session$entity_filename == hot_pdf_attachment_filenames )
>              {
>              #build the filename out of MD5, length and filename
>              hot_attachment_dumpname = fmt("dumped_pdf_files\/%s:%d:%s",
> session$content_hash, length, session$entity_filename);
>              }
>         if ( session$entity_filename == hot_word_attachment_filenames )
>              {
>              hot_attachment_dumpname = fmt("dumped_doc_files\/%s:%d:%s",
> session$content_hash, length,session$entity_filename);
>              }
>         #get a raw filehandle, notice open() instead of open_log_file(),
> write the data out, and be sure to close the fh
>         hot_attachment_dump_fh = open( hot_attachment_dumpname );
>         write_file(hot_attachment_dump_fh, data);
>         close(hot_attachment_dump_fh);
>         #log stuff to the hot logfile as well
>              mime_log_hot_msg(session, "hot data", fmt("%d: %s", length,
> sub_bytes(data, 0, 256)));
>         mime_log_hot_msg(session, "hot data", fmt("File dumped: %s MD5: %s",
> session$entity_filename, session$content_hash));
>         }
> I attached the modifed mime.bro in case anyone wanted to see the how the
> rest of it.
>> > The scan for office docs would be similiar, but use 'OfficeMalScanner'
>> > instead of pdfid.py and pdf-parser.py. If I get this to work, I would like
>> > to do something very similar with http files.
>> Makes sense.
>> > How can I call the external tools?  Is this the right place to be doing
>> > this?
>> You can't currently do this in a way that would be feasible on live
>> traffic.  The problem is that the call to the external tool would block Bro
>> and cause it to start dropping packets.  There is a "when" statement that
>> can help build asynchronous function calls though.  So that the stack state
>> will be saved and used again when the function call returns.  I don't know
>> if the system() (I think this is what you're looking for to run external
>> programs) function can be used with the when statement though.
> I suppose the short answer is yes. I was looking for something like the
> system() call. Like modifying the PyBroccoli Example from below:
> PyBroccoli Example:
> @event
> def pong(src_time, dst_time):
>     print "pong event: time=%f/%f s" % \
>        (dst_time - src_time, current_time() - src_time)
> bc = Connection("")
> bc.send("ping", time(current_time()))
> To:
> @event (event == dumped pdf file)
> def pass_pdf(file):
>       system(pdf_scan.py -f dumped_file.pdf > tempdir)
> With what you mentioned taken into account, we can't ask bro to wait on the
> results, but maybe we could dump the results to a logfile for alerting?
>> If you are looking to run this on tracefiles for now though, you can
>> certainly just use the system function to call your external tool.  It takes
>> a single argument (a string) that is the command line you'd like to run.
>>  There is a function for defanging data if you need to do that too (taking
>> something off the line and using it in the command line) named
>> str_shell_escape.  You do need to make sure that the data that is defanged
>> with str_shell_escape is placed within double-quotes.
>> > I would be surprised if this capability doesn't already exist and
>> > suppose I might be going about this all wrong. I would just prefer to
>> > incorporate the file scans in Bro vice running them completely
>> > independently. If I wasn't clear or am completely out in left field feel
>> > free to be honest. I won't be offended.
>> Nope, not out in left field at all and personally I'm a bit ashamed I
>> never wrote a mime-ext.bro script that was a bit more capable like the
>> http-ext script.  I'm going to be rewriting the mime.bro script for the next
>> release though and it will definitely have file extraction and
>> identification capabilities built into it.  However, we are going to be
>> working toward a much more generalized notion of files for some future
>> release of Bro.  I've worked a bit on how that may proceed, but
>> unfortunately we definitely won't be anywhere close to ready with that for
>> the next release.
> <sarcasm>
> Maybe you should charge "more" for Bro...
> </sarcasm>
> No, you all are doing a great job on this project. I just wish I could do
> more to help.
>>  .Seth
>> --
>> Seth Hall
>> International Computer Science Institute
>> (Bro) because everyone has a network
>> http://www.bro-ids.org/
> Will
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro

More information about the Bro mailing list