[Bro] Inconsistent file size during extraction
seth at corelight.com
Thu Feb 1 19:49:28 PST 2018
Yep, I was going to comment that that's probably the issue, but I'll
give some more details on why things may end up that way.
"total_bytes" - is for when the size of the file is known by some
secondary mechanism, like the file size being transmitted as part of a
protocol or a file being read off disk.
"seen_bytes" - represents the number of actual bytes of data that
were passed into the file analysis framework.
This is another case where small packet loss issues can have outsized
effects because the following bytes can't be reassembled into the file
correctly and you don't get anymore data.
Also, nice to see on the mailing list again Josh!
On 1 Feb 2018, at 22:07, Josh Liburdi wrote:
> Seems that this particular connection may be affected by tapping
> On Thu, Feb 1, 2018 at 4:13 PM, Josh Liburdi
> <liburdi.joshua at gmail.com>
>> Hi all,
>> I'm seeing instances where files are being extracted inconsistently
>> what is reported in files.log. Here is a redacted example:
>> #fields ts fuid tx_hosts rx_hosts conn_uids source depth analyzers
>> mime_type filename duration local_orig is_orig *seen_bytes*
>> missing_bytes overflow_bytes timedout parent_fuid md5 sha1 sha256
>> extracted extracted_cutoff extracted_size
>> #types time string set[addr] set[addr] set[string] string count
>> set[string] string string interval bool bool count count count count
>> string string string string string bool count
>> 1517528771.042220 Fz2Z2m3zwQcc3gqDS3 x.x.x.x x.x.x.x
>> HTTP 0 EXTRACT application/vnd.openxmlformats-officedocument.
>> spreadsheetml.sheet 0.258350 - F *219414* *12977556* 0 0 F - - - -
>> extract-1517528771.04222-HTTP-Fz2Z2m3zwQcc3gqDS3 F -
>> File on disk:
>> *219414* Feb 1 16:04
>> The file on disk is the same size as the amount of bytes sent to the
>> analyzer (seen_bytes field) -- it should be the same size as the
>> total_bytes field. I've seen this happen many times (though,
>> speaking, it is rare).
>> Any thoughts on this behavior? I'm seeing this on Bro 2.5.1.
> Bro mailing list
> bro at bro-ids.org
Seth Hall * Corelight, Inc * www.corelight.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Bro