[Bro-Dev] [Bro-Commits] [git/bro] topic/seth/more-file-type-ident-fixes: Lots of fixes for file type identification. (ee3e885)

Mon Mar 16 07:56:05 PDT 2015

> On Mar 13, 2015, at 9:14 PM, Seth Hall <seth at icir.org> wrote:
> 
>     - Plain text now identified with BOMs for UTF8,16,32
>       (even though 16 and 32 wouldn't get identified as plain text, oh-well)

Maybe it’s good/correct to identify UTF8,16,32 as associated w/ a main type of “text”, but a bit ambiguous or superfluous to label them “plain” — what even is “plain text” ?  For any “text”, you always need to know its character encoding to read it, right?  I guess the name has to stay for historical/compatibility reasons, though.

But as long as we're basing stuff from the heritage of “MIME” types, should we extend the file signature syntax to allow specifying an extra/optional field?  Then you can stick character encoding in there as separate component for “text” types.

- Jon