[Bro] Bro 2.2 File Extraction (RHEL 6.5)

Seth Hall seth at icir.org
Wed Aug 6 21:14:26 PDT 2014

On Aug 6, 2014, at 10:27 PM, Jonathon Wright <jonathon.s.wright at gmail.com> wrote:

> The problem is, I won't know what file/md5_value to compare it too since I wont know the original filename. Hope that makes sense.

If you're running Bro with broctl, you will already have hashes (md5 and sha1) for every file transferred in your files.log.  

> For example, if a user downloads something.exe (via http), bro will create a HTTP-blahblah file name. My problem at that point, is how do I know what the user tried to download, was it "notepad.exe" or "maliciousIntent.exe"? I will only have a directory full of HTTP-blahblah names, correct? That was where I was trying to go. Perhaps I misunderstood your response and you already answered me?

# Look at the extracted files.
$ ls ./extract_files

# Look at the line in files.log that maps to that file.
$ grep extract-HTTP-FsRNbD323oiMhWA761 files.log
1407384770.727269	FsRNbD323oiMhWA761	CuTpVT1LB2eQP0eMP4	HTTP	0	EXTRACT	application/x-dosexec	-	1.151308	49152	49152	0	0	F	-	-	-	-	extract-HTTP-FsRNbD323oiMhWA761

# Look for the HTTP request that maps to that file.
$ grep FsRNbD323oiMhWA761 http.log
1407384770.568614	CuTpVT1LB2eQP0eMP4	1066	80	1	GET	/lprx.php	-	-	0	49152200	OK	-	-	-	(empty)	-	-	-	-	-	FsRNbD323oiMhWA761	application/x-dosexec

You can see in that example that the best file name we could have possibly hoped to extract for that connection would be "lprx.php" which I don't think is what you want.  That is real traffic (with modified field data) from a compromised host downloading an update to the malware installed on it.

> If so, apolgies, but I still seem to be missing the connection of the bro created file name when its carved and the actual filename of the exe that the user attempted to download.

Ah, ok.  I can explain a bit more here.  Before arriving at the current model, I spent a lot of time thinking about how to flexibly name files.  What I realized is that I don't want aspects of the network traffic to be able to affect the name of the file being written to disk (by default at least, you can do whatever you want in your own scripts).  There could be maliciously named files or attempts to play with the path to write into sensitive areas of the file system.  By giving the files being written to disk names that were totally fabricated by the Bro process we sidestep any of these potential issues.  You can use the name of the extracted file to then pivot back into the logs.


