[Bro-Dev] [JIRA] (BIT-1143) Investigate replacing libmagic w/ signatures for file identificaiton

Jon Siwek (JIRA) jira at bro-tracker.atlassian.net
Thu Mar 6 12:39:18 PST 2014

    [ https://bro-tracker.atlassian.net/browse/BIT-1143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706#comment-15706 ] 

Jon Siwek commented on BIT-1143:

I've got topic/jsiwek/file-signatures in bro, 3rdparty, bro-testing, and bro-testing-private repos to a point where they might be ready to merge or at least I'm unsure what more to do w/ it at the moment.  Seth do you want this assigned to you to first look over the new file magic signatures (maybe look for important mime types that are somehow missing, or try improving some regexes) ?  Also open to others to take a look and make suggestions.

New file magic signatures:  these are derived from the default libmagic magic database in a semi-automatic/assisted way.  I instrumented a version of the {{file}} command, see https://github.com/jsiwek/file/tree/bro-signatures, to get at the internal representation of the magic rules and had it emit Bro signatures for any set of rules associated with a MIME type.  The conversion logic is not currently perfect for all combinations of magic rules and the effort to make it perfect didn't seem worth it, so warnings are emitted upon encountering tricky scenarios.  Afterward, I did a pass over everything and manually fixed (or just removed, depending on circumstances) the cases where it indicated an automatic conversion might not be correct.

Signature maintenance:  Going forward, Bro's file signatures can be considered on their own and improved independently of libmagic's rules (i.e. there's no required/extra/continual maintenance task in updating signatures, though the libmagic database would probably still be useful for reference when someone is trying to improve/add signatures).

Signature accuracy: Surprisingly, Bro's test suites don't detect file types much differently using the new signatures over libmagic.  The variance is actually less than I've seen in switching between versions of libmagic.  And the differences in detected MIME types are at least somewhat reasonable -- the most questionable differences are the text/plain detections because libmagic has builtin logic for various text encodings/charsets, but the signature I ended up writing to fill that gap just does ASCII for now.

Signature performance: Didn't do very robust profiling/benchmarking, but I found slight improvements in various configurations in terms of instructions and time running against the long m57 pcap.  That at least matches expectations of it not theoretically being able to be worse than libmagic's approach, so didn't dig any deeper.  And it also should scale better as the number of signatures increases.

Signature unit tests: there's no new regression tests in place for the new file magic signatures.  That could take a while to make, is it required to have immediately or can wait?  And any opinion on the structure of such a test suite?  I imagine just having the test suite in the bro repo, but a corpus of file types to test against is probably going to need some other canonical place to live.

> Investigate replacing libmagic w/ signatures for file identificaiton
> --------------------------------------------------------------------
>                 Key: BIT-1143
>                 URL: https://bro-tracker.atlassian.net/browse/BIT-1143
>             Project: Bro Issue Tracker
>          Issue Type: New Feature
>          Components: Bro
>    Affects Versions: git/master
>            Reporter: Jon Siwek
>            Assignee: Jon Siwek
>             Fix For: 2.3
> I think it makes sense to try to make the switch from libmagic to using Bro's own signature engine for file identification before the next release.  Don't want people getting used to magic file format for their own custom file identification rules.

This message was sent by Atlassian JIRA

More information about the bro-dev mailing list