[Bro-Dev] [Bro-Commits] [git/bro] topic/seth/more-file-type-ident-fixes: Lots of fixes for file type identification. (ee3e885)
jsiwek at illinois.edu
Mon Mar 16 07:56:05 PDT 2015
> On Mar 13, 2015, at 9:14 PM, Seth Hall <seth at icir.org> wrote:
> - Plain text now identified with BOMs for UTF8,16,32
> (even though 16 and 32 wouldn't get identified as plain text, oh-well)
Maybe it’s good/correct to identify UTF8,16,32 as associated w/ a main type of “text”, but a bit ambiguous or superfluous to label them “plain” — what even is “plain text” ? For any “text”, you always need to know its character encoding to read it, right? I guess the name has to stay for historical/compatibility reasons, though.
But as long as we're basing stuff from the heritage of “MIME” types, should we extend the file signature syntax to allow specifying an extra/optional field? Then you can stick character encoding in there as separate component for “text” types.
More information about the bro-dev