[Bro-Dev] [JIRA] (BIT-1257) Same file id generated for potentially different files

Jimmy Jones (JIRA) jira at bro-tracker.atlassian.net
Mon Sep 29 02:19:07 PDT 2014


    [ https://bro-tracker.atlassian.net/browse/BIT-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18215#comment-18215 ] 

Jimmy Jones commented on BIT-1257:
----------------------------------

Sorry I've not been as clear as I could here. I've changed my own bro instance, but I'm concerned that out of the box, Bro's behaviour while convenient for the majority of cases, isn't correct and will result in irrecoverably corrupted files in some instances (unless you’re lucky enough to keep full captures).

I've researched this further and I would argue there is a right answer and the spec is clear, see RFC2616, 10.2.7:

bq. A cache MUST NOT combine a 206 response with other previously cached content if the ETag or Last-Modified headers do not match exactly, see 13.5.4.

I'd say Bro is a cache in this instance, and for example clients like IE follow this [behavior|http://blogs.msdn.com/b/ieinternals/archive/2011/06/03/send-an-etag-to-enable-http-206-file-download-resume-without-restarting.aspx] and Adobe Reader uses the If-Range conditional to ensure the URL is the same document.

I agree my change is over-conservative, would you accept something that include ETag and Last-Modified in the hash? Or is the (small) chance of corruption not a concern (which is fine, as long as someone has actively decided not to follow the RFC)


> Same file id generated for potentially different files
> ------------------------------------------------------
>
>                 Key: BIT-1257
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1257
>             Project: Bro Issue Tracker
>          Issue Type: Problem
>          Components: Bro
>    Affects Versions: git/master, 2.3
>         Environment: CentOS 6
>            Reporter: Jimmy Jones
>         Attachments: fa.bro, sample-samefileid.pcap
>
>
> Attached sample contains two HTTP downloads of the same URL from the same client, but there are no guarantees that the files is actually the same (no Etags etc - in this case it actually is the same, but lets pretend they were different...). However the file analysis framework seems to give the same file ID in file_name and file_chunk for both downloads.
> Think this is something to do with Range requests as doesn't happen if do "normal" HTTP requests.



--
This message was sent by Atlassian JIRA
(v6.4-OD-05-009#64003)



More information about the bro-dev mailing list