[Bro] File Scanning Capability

Scott Campbell scampbell at lbl.gov
Mon Mar 21 13:27:50 PDT 2011

Hash: SHA1

I implemented a straw man version of what you are doing for html file
transfers - in particular looking at PDF files via the pdfid tool.  As
Jim pointed out, it is trivial to do a python->bro event call back via
Broccoli.  I will post the code when I get back home - it is more of a
hack, but might prove to be helpful.


On 3/21/11 2:44 PM, Will wrote:
> On Mon, Mar 21, 2011 at 2:49 PM, Seth Hall <seth at icir.org> wrote:
>> On Mar 21, 2011, at 2:16 PM, Will wrote:
>>> I will without a doubt eventually incorporate
>> "http-ext-identified-files.sig" instead of what I am currently using, but I
>> am having trouble determining where to integrate the logic for handling each
>> file type. As it currently works, I am saving off every pdf and word doc,
>> which would be unnecessary if I used bro to call the external tools and
>> evaluate the results.
>>>> That won't actually work quite right.  The http-ext-identified-files.sig
>> file uses special signature keywords that the http analyzer >>provides to
>> detect file types.  It's not directly applicable to SMTP/MIME transfers.
>> Understandable. Being that there are so many different types it would be
> beneficial enough to create a signature file for SMTP/MIME. I would be happy
> to share it when I get it done.
>>> Current logic (this method calls for the external tools to be run against
>> the directory by cron and are independent of Bro):
>>>         hot_attachment_dump_fh = open( hot_attachment_dumpname );
>>>         write_file(hot_attachment_dump_fh, data);
>>>         close(hot_attachment_dump_fh);
>>>> In what event are you currently running using this code?
> Here is the entire event:
> event mime_entity_data(c: connection, length: count, data: string)
>        {
>        local session = get_session(c, T);
>        #md5 hashing is now a builtin function, so just call it and dumpthe
> result into the content_hash field
>        #that field in the info struct was already there, just had to add
> this to fill it.
>        session$content_hash = md5_hash(data);
>        #log the first 256 bytes of the attachment and the MD5 hash.
>        mime_log_msg(session, "data", fmt("%d: %s", length, sub_bytes(data,
> 0, 256)));
>        mime_log_msg(session, "all data", fmt("MD5: %s",
> session$content_hash));
>        #if the hot flag is set then we dump the MIME-decoded attachment to
> it's own file for analysis
>        if( session$entity_is_hot )
>         {
>         if ( session$entity_filename == hot_pdf_attachment_filenames )
>              {
>              #build the filename out of MD5, length and filename
>              hot_attachment_dumpname = fmt("dumped_pdf_files\/%s:%d:%s",
> session$content_hash, length, session$entity_filename);
>              }
>         if ( session$entity_filename == hot_word_attachment_filenames )
>              {
>              hot_attachment_dumpname = fmt("dumped_doc_files\/%s:%d:%s",
> session$content_hash, length,session$entity_filename);
>              }
>         #get a raw filehandle, notice open() instead of open_log_file(),
> write the data out, and be sure to close the fh
>         hot_attachment_dump_fh = open( hot_attachment_dumpname );
>         write_file(hot_attachment_dump_fh, data);
>         close(hot_attachment_dump_fh);
>         #log stuff to the hot logfile as well
>              mime_log_hot_msg(session, "hot data", fmt("%d: %s", length,
> sub_bytes(data, 0, 256)));
>         mime_log_hot_msg(session, "hot data", fmt("File dumped: %s MD5: %s",
> session$entity_filename, session$content_hash));
>         }
> I attached the modifed mime.bro in case anyone wanted to see the how the
> rest of it.
>> The scan for office docs would be similiar, but use 'OfficeMalScanner'
>> instead of pdfid.py and pdf-parser.py. If I get this to work, I would like
>> to do something very similar with http files.
>> Makes sense.
>>> How can I call the external tools?  Is this the right place to be doing
>> this?
>> You can't currently do this in a way that would be feasible on live
>> traffic.  The problem is that the call to the external tool would block Bro
>> and cause it to start dropping packets.  There is a "when" statement that
>> can help build asynchronous function calls though.  So that the stack state
>> will be saved and used again when the function call returns.  I don't know
>> if the system() (I think this is what you're looking for to run external
>> programs) function can be used with the when statement though.
> I suppose the short answer is yes. I was looking for something like the
> system() call. Like modifying the PyBroccoli Example from below:
> PyBroccoli Example:
> @event
> def pong(src_time, dst_time):
>     print "pong event: time=%f/%f s" % \
>        (dst_time - src_time, current_time() - src_time)
> bc = Connection("")
> bc.send("ping", time(current_time()))
> To:
> @event (event == dumped pdf file)
> def pass_pdf(file):
>       system(pdf_scan.py -f dumped_file.pdf > tempdir)
> With what you mentioned taken into account, we can't ask bro to wait on the
> results, but maybe we could dump the results to a logfile for alerting?
>> If you are looking to run this on tracefiles for now though, you can
>> certainly just use the system function to call your external tool.  It takes
>> a single argument (a string) that is the command line you'd like to run.
>>  There is a function for defanging data if you need to do that too (taking
>> something off the line and using it in the command line) named
>> str_shell_escape.  You do need to make sure that the data that is defanged
>> with str_shell_escape is placed within double-quotes.
>>> I would be surprised if this capability doesn't already exist and suppose
>> I might be going about this all wrong. I would just prefer to incorporate
>> the file scans in Bro vice running them completely independently. If I
>> wasn't clear or am completely out in left field feel free to be honest. I
>> won't be offended.
>> Nope, not out in left field at all and personally I'm a bit ashamed I never
>> wrote a mime-ext.bro script that was a bit more capable like the http-ext
>> script.  I'm going to be rewriting the mime.bro script for the next release
>> though and it will definitely have file extraction and identification
>> capabilities built into it.  However, we are going to be working toward a
>> much more generalized notion of files for some future release of Bro.  I've
>> worked a bit on how that may proceed, but unfortunately we definitely won't
>> be anywhere close to ready with that for the next release.
> <sarcasm>
> Maybe you should charge "more" for Bro...
> </sarcasm>
> No, you all are doing a great job on this project. I just wish I could do
> more to help.
>>  .Seth
>> --
>> Seth Hall
>> International Computer Science Institute
>> (Bro) because everyone has a network
>> http://www.bro-ids.org/
> Will
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro

Version: GnuPG v1.4.9 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/


More information about the Bro mailing list