[Bro] different file hash between downloaded file by ANALYZER_EXTRACT with original file
Myth Ren
email4myth at gmail.com
Mon Aug 7 00:29:29 PDT 2017
Hello, everyone .
i'm new to bro recently, i'm using FAF(File Analysis Framework) to
extract certain type file to disk for further analysis from traffic .
but now i have problem which is so difficult to understand:
- bro extract file size is one byte bigger than my original file
- or bro extract file the right size with my original file, but it's
different MD5 value among these files
below is my test env, test steps and test result:
# my test env
bro version:
- bro version 2.5-156
OS (32C 64G):
- CentOS Linux release 7.3.1611 (Core)
CPU model:
- Model name: Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz
- CPU(s): 32
- CPU MHz: 2334.445
NIC:
- 03:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network
# my test bro scripts
```
event file_sniff(f: fa_file, meta: fa_metadata)
{
print "file sniff event by Myth";
if ( meta?$mime_type )#&& hook FileExtraction::extract(f, meta) )
{
if ( meta$mime_type in mime_to_ext )
{
local fext = mime_to_ext[meta$mime_type];
if ( fext == "txt" )
{
#print "txt";
if ( f$source != "SMTP" )
{
#print "NOT SMTP";
return;
}
}
}
else
return;
#fext = split_string(meta$mime_type, /\//)[1];
local fname = fmt("%s%s-%s.%s", path, f$source, f$id, fext);
# file path
#print fname;
Files::add_analyzer(f, Files::ANALYZER_MD5);
Files::add_analyzer(f, Files::ANALYZER_SHA1);
Files::add_analyzer(f, Files::ANALYZER_SHA256);
Files::add_analyzer(f, Files::ANALYZER_EXTRACT,[$extract_filename=fname]);
}
}
```
# my test steps
1. generate test file
>>> [root at sensor ~]# dd if=/dev/urandom of=test.for.bro.txt bs=1024
count=512
>>> [root at sensor ~]# tar -cvzf test.for.bro.tar.gz test.for.bro.txt
2. original file size and MD5 valud
>>> [root at sensor ~]# ls -lt test.for.bro.tar.gz
-rw-r--r-- 1 root root 524608 8月 7 13:59 test.for.bro.tar.gz
>>> [root at sensor ~]# md5sum test.for.bro.tar.gz
6e755b5c0a7754c7066ca6db5f0f90ba test.for.bro.tar.gz
2. start test web server using Python
>>> [root at sensor ~]# python -m SimpleHTTPServer 8998 > ws.log 2>&1
3. start bro
>>> [root at sensor myth]# /usr/local/bro/bin/bro -i eno1 -C
bro-scripts/tophant.entrypoint.bro > myth.log 2>&1
4. using `ab` do make lots of http request to test file from another machine
>>> [root at localhost ~]# ab -n 2000 -c 4 '
http://10.0.81.54:8998/test.for.bro.tar.gz'
5. result ( after all request is done)
5.1 webserver process request count
>>> [root at sensor ~]# cat ws.log | grep test.for.bro | wc -l
2000
5.2 bro `file_sniff` event count
>>> [root at sensor myth]# cat myth.log | grep "file sniff event by Myth" | wc
-l
976
5.3 download file count
>>> [root at sensor sensor_files_by_myth]# ls | wc -l
973
5.4 file count with different file size:
>>> [root at sensor sensor_files_by_myth]# ls -lt | grep -v 524608 | wc -l
193
5.5 file count with same file size:
>>> [root at sensor sensor_files_by_myth]# ls -lt | grep 524608 | wc -l
780
5.6 file count with same MD5 value:
>>> [root at sensor sensor_files_by_myth]# ls -lt | awk '{print $NF}' | xargs
md5sum | grep 6e755b5c0a7754c7066ca6db5f0f90ba | wc -l
19
5.7 file count with same file size but different MD5 (!!! NOTICE: all is
different MD5)
>>> [root at sensor sensor_files_by_myth]# ls -lt | grep 524608 | awk '{print
$NF}' | xargs md5sum | grep -v 6e755b5c0a7754c7066ca6db5f0f90ba | awk
'{print $1}' | sort | uniq -c | wc -l
761
5.8 download file size distribution:
>>> [root at sensor sensor_files_by_myth]# ls -lt | awk '{print $5}' | sort
-rn | uniq -c
* 136 524609 <<<<<<<<<<<<<<< this is one byte bigger than my
original test file !!!*
* 780 524608*
3 523990
3 522542
8 521094
1 520208
1 519646
2 518198
1 515302
1 513854
1 512968
1 512406
1 510958
1 509510
2 503718
1 502176
1 501384
1 497926
1 490296
1 488808
1 487040
1 486342
1 480550
1 473310
1 467518
1 464622
1 458830
1 453038
1 442902
1 441454
1 396566
1 382408
1 377742
1 358918
1 354574
1 318240
1 283312
1 263350
1 256110
1 250318
1 234952
1 189502
1 164886
1 79454
2 2710
1
Thanks for reading so far, wish someone could help me with this :)
Myth
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20170807/922175b9/attachment.html
More information about the Bro
mailing list