[Bro] different file hash between downloaded file by ANALYZER_EXTRACT with original file

Myth Ren email4myth at gmail.com
Mon Aug 7 00:29:29 PDT 2017

Hello, everyone .
    i'm new to bro recently, i'm using FAF(File Analysis Framework) to
extract certain type file to disk for further analysis from traffic .
but now i have problem which is so difficult to understand:
    -  bro extract file size is one byte bigger than my original file
    -  or bro extract file the right size with my original file, but it's
different MD5 value among these files

below is my test env, test steps and test result:

# my test env
bro version:
- bro version 2.5-156
OS (32C 64G):
- CentOS Linux release 7.3.1611 (Core)
CPU model:
- Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
- CPU(s): 32
- CPU MHz: 2334.445
- 03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network

# my test bro scripts
event file_sniff(f: fa_file, meta: fa_metadata)
print "file sniff event by Myth";
if ( meta?$mime_type )#&& hook FileExtraction::extract(f, meta) )
if ( meta$mime_type in mime_to_ext )
local fext = mime_to_ext[meta$mime_type];
if ( fext == "txt" )
#print "txt";
if ( f$source != "SMTP" )
#print "NOT SMTP";
#fext = split_string(meta$mime_type, /\//)[1];

local fname = fmt("%s%s-%s.%s", path, f$source, f$id, fext);
# file path
#print fname;
Files::add_analyzer(f, Files::ANALYZER_MD5);
Files::add_analyzer(f, Files::ANALYZER_SHA1);
Files::add_analyzer(f, Files::ANALYZER_SHA256);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT,[$extract_filename=fname]);

# my test steps

1. generate test file

>>> [root at sensor ~]# dd if=/dev/urandom of=test.for.bro.txt bs=1024
>>> [root at sensor ~]# tar -cvzf test.for.bro.tar.gz test.for.bro.txt

2. original file size and MD5 valud

>>> [root at sensor ~]# ls -lt test.for.bro.tar.gz
-rw-r--r-- 1 root root 524608 8月   7 13:59 test.for.bro.tar.gz
>>> [root at sensor ~]# md5sum test.for.bro.tar.gz
6e755b5c0a7754c7066ca6db5f0f90ba  test.for.bro.tar.gz

2. start test web server using Python
>>> [root at sensor ~]# python -m SimpleHTTPServer 8998 > ws.log 2>&1

3. start bro
>>> [root at sensor myth]# /usr/local/bro/bin/bro -i eno1 -C
bro-scripts/tophant.entrypoint.bro > myth.log 2>&1

4. using `ab` do make lots of http request to test file from another machine
>>> [root at localhost ~]# ab -n 2000 -c 4 ''

5. result ( after all request is done)

5.1 webserver process request count
>>> [root at sensor ~]# cat ws.log  | grep test.for.bro | wc -l

5.2 bro `file_sniff` event count
>>> [root at sensor myth]# cat myth.log | grep "file sniff event by Myth" | wc

5.3 download file count
>>> [root at sensor sensor_files_by_myth]# ls | wc -l

5.4 file count with different file size:
>>> [root at sensor sensor_files_by_myth]# ls -lt | grep -v 524608 | wc -l

5.5 file count with same file size:
>>> [root at sensor sensor_files_by_myth]# ls -lt | grep 524608 | wc -l

5.6 file count with same MD5 value:
>>> [root at sensor sensor_files_by_myth]# ls -lt | awk '{print $NF}' | xargs
md5sum | grep 6e755b5c0a7754c7066ca6db5f0f90ba | wc -l

5.7 file count with same file size but different MD5 (!!! NOTICE: all is
different MD5)
>>> [root at sensor sensor_files_by_myth]# ls -lt | grep 524608 | awk '{print
$NF}' | xargs md5sum | grep -v 6e755b5c0a7754c7066ca6db5f0f90ba | awk
'{print $1}' | sort | uniq -c | wc -l

5.8 download file size distribution:
>>> [root at sensor sensor_files_by_myth]# ls -lt | awk '{print $5}' | sort
-rn | uniq -c
*  136 524609       <<<<<<<<<<<<<<< this is one byte bigger than my
original test file !!!*
*  780 524608*
      3 523990
      3 522542
      8 521094
      1 520208
      1 519646
      2 518198
      1 515302
      1 513854
      1 512968
      1 512406
      1 510958
      1 509510
      2 503718
      1 502176
      1 501384
      1 497926
      1 490296
      1 488808
      1 487040
      1 486342
      1 480550
      1 473310
      1 467518
      1 464622
      1 458830
      1 453038
      1 442902
      1 441454
      1 396566
      1 382408
      1 377742
      1 358918
      1 354574
      1 318240
      1 283312
      1 263350
      1 256110
      1 250318
      1 234952
      1 189502
      1 164886
      1 79454
      2 2710

Thanks for reading so far, wish someone could help me with this :)

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170807/922175b9/attachment.html 

More information about the Bro mailing list