[Bro] Sanity check - Grabbing platform tokens from browser user agents (was p0f)

Gary Faulkner gary at doit.wisc.edu
Mon Feb 10 09:50:53 PST 2014


After running various iterations of the original script against several 
pcaps of our local traffic (and a couple days of live traffic) I ended 
up finding a lot of user agents that would match against the 
desktop/server OS rules, but were not necessarily desktops or servers. I 
ended up adding to the matching rules in part to parse out these things 
and also to detect other things we were interested in. Checking for more 
things seems to incur a performance penalty, so I also made some effort 
to move some of the more common matches sooner in the if/else statements 
to avoid having to check all of the less likely items first. The 
create_expire statement still doesn't behave as I expected, as each 
match is logged once per log rotation as opposed to once per day, but 
the matching seems to work with the exception that it doesn't check for 
every possible user agent case. I may also be missing explicitly 
including scripts that are already commonly loaded.

======================== Begin Script ========================
@load base/utils/site

module BrowserPlatform;

export
{
     # The fully resolved name for this log will be BrowserPlatform::LOG
     redef enum Log::ID += { LOG };

     type Info: record {
         ts:                 time    &log &optional;
         uid:                string  &log &optional;
         host:               addr    &log &optional;
         platform_token:     string  &log &optional;
         unparsed_version:   string  &log &optional;
     };

     # A set of seen IP + OS combinations. Used to prevent logging the 
same combo repeatedly.
     global seen_browser_platforms: set[string] &create_expire = 1.0 day 
&synchronized &redef;
}

event bro_init() &priority=5
     {
     Log::create_stream(BrowserPlatform::LOG,[$columns=Info]);
     }

event http_header(c: connection, is_orig: bool, name: string, value: string)
{
     local platform = "Unknown OS";
     if (!is_orig || name != "USER-AGENT" || 
!Site::is_local_addr(c$id$orig_h))
         return;

# Parse out Apple IOS and Android variants first as some apps will 
dispay as compatible with a desktop OS version

     if ( /iPhone/ in value )
     platform = "iPhone";
     else if ( /iPad/ in value )
         platform = "iPad";
     else if ( /iPod/ in value )
         platform = "iPod";
     else if ( /Android/ in value )
         platform = "Android";

# Once we've parsed out mobiles move onto desktop/server OS
# User agents listed in order of expected use or to pre-parse 
user-agents that might otherwise match multiple rules.

     else if ( /Windows/ in value )
         {
     if ( /Xbox/ in value ) # often includes a Windows OS version or 
identifies as a Mobile browser
         platform = "Xbox";
         else if ( /Phone/ in value || /Mobile/ in value ) # often 
includes Windows OS version
             platform = "Windows Phone";
         else if ( /Windows NT 6.1/ in value )
              platform = "Windows 7";
         else if ( /Windows NT 5.1/ in value )
              platform = "Windows XP";
         else if ( /Windows NT 5.2/ in value && /WOW64/ in value )
              platform = "Windows XP x64";
         else if ( /Windows NT 6.0/ in value )
              platform = "Windows Vista";
         else if ( /Windows NT 6.2/ in value )
              platform = "Windows 8";
         else if ( /Windows NT 6.3/ in value )
              platform = "Windows 8.1";
        else if ( /Windows 95/ in value )
              platform = "Windows 95";
         else if ( /Windows 98/ in value && /4.90/ !in value )
              platform = "Windows 98";
         else if ( /Win 9x 4.90/ in value )
              platform = "Windows Me";
         else if ( /Windows NT 4.0/ in value )
              platform = "Windows NT 4.0";
         else if ( /Windows NT 5.0/ in value || /Windows 2000/ in value )
              platform = "Windows 2000";
#    Catch-all for identifying less common user-agents. Can be noisy.
#       else
#            platform = "Windows Other";
         }
     else if ( /Mac OS X/ in value )
         {
     if ( /Mac OS X 10_9/ in value || /Mac OS X 10.9/ in value )
             platform = "Mac OS X 10.9";
         else if ( /Mac OS X 10_8/ in value || /Mac OS X 10.8/ in value )
             platform = "Mac OS X 10.8";
         else if ( /Mac OS X 10_7/ in value || /Mac OS X 10.7/ in value )
             platform = "Mac OS X 10.7";
         else if ( /Mac OS X 10_6/ in value || /Mac OS X 10.6/ in value )
             platform = "Mac OS X 10.6";
         else if ( /Mac OS X 10_5/ in value || /Mac OS X 10.5/ in value )
             platform = "Mac OS X 10.5";
         else if ( /Mac OS X 10_4/ in value || /Mac OS X 10.4/ in value )
             platform = "Mac OS X 10.4";
#       Catch-all for identifying less common user-agents. Can be noisy.
#       else
#           platform = "Mac OS X Other";
         }
     else if ( /Linux/ in value )
         platform = "Linux";

# Check to see if IP+OS combo already logged and if not log it and add 
it to the list of tracked combos.

     local saw = cat(c$id$orig_h,platform); #There is probably a less 
ugly way to do this than cat, but it seems to work
     if ( platform != "Unknown OS" && saw !in seen_browser_platforms )
         {
         local rec: BrowserPlatform::Info = [$ts=network_time(), 
$uid=c$uid, $host=c$id$orig_h, $platform_token=platform, 
$unparsed_version=value];
         Log::write(BrowserPlatform::LOG, rec);
         add seen_browser_platforms[saw];
         }
}

======================== End Script ========================
On 1/31/2014 10:56 PM, Gary Faulkner wrote:
> Thanks for the suggestions, that cleans that bit up quite nicely. I
> actually started by trying to deconstruct the various software.bro
> scripts and work my way backwards through the framework to see what was
> doing what. I'm still trying to navigate my way through that code, but I
> agree that it would make more sense to leverage it directly than create
> a derivative just to pull out a specific bit of the data. I'm not
> currently running Splunk in any production sense, but that is pretty
> much what I'm trying to do in Bro. Thanks for sharing it!
>
> Regards,
> Gary
>
> On 1/31/2014 6:12 PM, Justin Azoff wrote:
>> On Wed, Jan 29, 2014 at 05:35:46PM -0600, Gary Faulkner wrote:
>>> event http_header(c: connection, is_orig: bool, name: string, value: string)
>>> {
>>>       local platform = "Unknown OS";	
>>>       if ( is_orig )
>>>           {
>>> 	if ( name == "USER-AGENT" && /Windows NT 5.1/ in value )
>>> 		{
>>> 		platform = "Windows XP";
>>> 		}
>>>           else if ( name == "USER-AGENT" && /Windows NT 6.0/ in value )
>>>                   {
>>> 		platform = "Windows Vista";
>>>                   }
>>>           else if ( name == "USER-AGENT" && /Windows NT 6.1/ in value )
>>>                   {
>>>                   platform = "Windows 7";
>>>                   }
>> ..
>>
>> Modifying the http_header event handler as follows will increase performance:
>>
>> event http_header(c: connection, is_orig: bool, name: string, value: string)
>> {
>>       if(!is_orig || name != "USER-AGENT")
>>           return;
>>       if(/Windows NT 5.1/ in value)
>>           platform = "Windows XP";
>>       else if ...
>>
>> FWIW, I used to do this kind of thing outside of bro using splunk:
>>
>> https://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py
>>
>> One thing you may want to do is rather than use the http_header event
>> use
>>
>> event log_software(rec: Info)
>> {
>>       ...
>> }
>>
>> which will be raised every time a new software version is seen.  The
>> software framework is already pulling most of the info out that you
>> might need, so you can piggy back on the work that it is doing.
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6257 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20140210/f1da0222/attachment.bin 


More information about the Bro mailing list