[Bro] Sanity check - Grabbing platform tokens from browser user agents (was p0f)

Gary Faulkner gary at doit.wisc.edu
Wed Apr 2 08:30:59 PDT 2014


I haven't updated it from this point yet as I've been struggling with 
hooking into the existing software logging as well as having problems 
keeping track of state to prevent/reduce duplicate log entries. There 
were also some performance concerns raised so I've been hesitant to post 
any in progress work that might inadvertently cause someone else grief.

My observations after running the script continuously for the last month 
is that it probably needs the ability to exclude specific subnets. An 
example might be wireless networks that may have high client IP churn, 
short DHCP lease times, are more likely to have mobile devices with apps 
that have ugly user-agents, and just generally likely to provide 
unreliable data.

Regards,
Gary

On 4/2/2014 8:36 AM, Ryan wrote:
> Gary,
>
> This looks very nice. I'm curious if you had any more updates or
> improvements for this?
>
> Ryan Peck
>
>
>
> On Mon, Feb 10, 2014 at 12:50 PM, Gary Faulkner <gary at doit.wisc.edu> wrote:
>
>> After running various iterations of the original script against several
>> pcaps of our local traffic (and a couple days of live traffic) I ended up
>> finding a lot of user agents that would match against the desktop/server OS
>> rules, but were not necessarily desktops or servers. I ended up adding to
>> the matching rules in part to parse out these things and also to detect
>> other things we were interested in. Checking for more things seems to incur
>> a performance penalty, so I also made some effort to move some of the more
>> common matches sooner in the if/else statements to avoid having to check
>> all of the less likely items first. The create_expire statement still
>> doesn't behave as I expected, as each match is logged once per log rotation
>> as opposed to once per day, but the matching seems to work with the
>> exception that it doesn't check for every possible user agent case. I may
>> also be missing explicitly including scripts that are already commonly
>> loaded.
>>
>> ======================== Begin Script ========================
>>
>> @load base/utils/site
>>
>> module BrowserPlatform;
>>
>> export
>> {
>>      # The fully resolved name for this log will be BrowserPlatform::LOG
>>      redef enum Log::ID += { LOG };
>>
>>      type Info: record {
>>          ts:                 time    &log &optional;
>>          uid:                string  &log &optional;
>>          host:               addr    &log &optional;
>>          platform_token:     string  &log &optional;
>>          unparsed_version:   string  &log &optional;
>>      };
>>
>>      # A set of seen IP + OS combinations. Used to prevent logging the same
>> combo repeatedly.
>>      global seen_browser_platforms: set[string] &create_expire = 1.0 day
>> &synchronized &redef;
>> }
>>
>> event bro_init() &priority=5
>>      {
>>      Log::create_stream(BrowserPlatform::LOG,[$columns=Info]);
>>      }
>>
>> event http_header(c: connection, is_orig: bool, name: string, value:
>> string)
>> {
>>      local platform = "Unknown OS";
>>      if (!is_orig || name != "USER-AGENT" || !Site::is_local_addr(c$id$
>> orig_h))
>>          return;
>>
>> # Parse out Apple IOS and Android variants first as some apps will dispay
>> as compatible with a desktop OS version
>>
>>      if ( /iPhone/ in value )
>>      platform = "iPhone";
>>      else if ( /iPad/ in value )
>>          platform = "iPad";
>>      else if ( /iPod/ in value )
>>          platform = "iPod";
>>      else if ( /Android/ in value )
>>          platform = "Android";
>>
>> # Once we've parsed out mobiles move onto desktop/server OS
>> # User agents listed in order of expected use or to pre-parse user-agents
>> that might otherwise match multiple rules.
>>
>>      else if ( /Windows/ in value )
>>          {
>>      if ( /Xbox/ in value ) # often includes a Windows OS version or
>> identifies as a Mobile browser
>>          platform = "Xbox";
>>          else if ( /Phone/ in value || /Mobile/ in value ) # often includes
>> Windows OS version
>>              platform = "Windows Phone";
>>          else if ( /Windows NT 6.1/ in value )
>>               platform = "Windows 7";
>>          else if ( /Windows NT 5.1/ in value )
>>               platform = "Windows XP";
>>          else if ( /Windows NT 5.2/ in value && /WOW64/ in value )
>>               platform = "Windows XP x64";
>>          else if ( /Windows NT 6.0/ in value )
>>               platform = "Windows Vista";
>>          else if ( /Windows NT 6.2/ in value )
>>               platform = "Windows 8";
>>          else if ( /Windows NT 6.3/ in value )
>>               platform = "Windows 8.1";
>>         else if ( /Windows 95/ in value )
>>               platform = "Windows 95";
>>          else if ( /Windows 98/ in value && /4.90/ !in value )
>>               platform = "Windows 98";
>>          else if ( /Win 9x 4.90/ in value )
>>               platform = "Windows Me";
>>          else if ( /Windows NT 4.0/ in value )
>>               platform = "Windows NT 4.0";
>>          else if ( /Windows NT 5.0/ in value || /Windows 2000/ in value )
>>               platform = "Windows 2000";
>> #    Catch-all for identifying less common user-agents. Can be noisy.
>> #       else
>> #            platform = "Windows Other";
>>          }
>>      else if ( /Mac OS X/ in value )
>>          {
>>      if ( /Mac OS X 10_9/ in value || /Mac OS X 10.9/ in value )
>>              platform = "Mac OS X 10.9";
>>          else if ( /Mac OS X 10_8/ in value || /Mac OS X 10.8/ in value )
>>              platform = "Mac OS X 10.8";
>>          else if ( /Mac OS X 10_7/ in value || /Mac OS X 10.7/ in value )
>>              platform = "Mac OS X 10.7";
>>          else if ( /Mac OS X 10_6/ in value || /Mac OS X 10.6/ in value )
>>              platform = "Mac OS X 10.6";
>>          else if ( /Mac OS X 10_5/ in value || /Mac OS X 10.5/ in value )
>>              platform = "Mac OS X 10.5";
>>          else if ( /Mac OS X 10_4/ in value || /Mac OS X 10.4/ in value )
>>              platform = "Mac OS X 10.4";
>> #       Catch-all for identifying less common user-agents. Can be noisy.
>> #       else
>> #           platform = "Mac OS X Other";
>>          }
>>      else if ( /Linux/ in value )
>>          platform = "Linux";
>>
>> # Check to see if IP+OS combo already logged and if not log it and add it
>> to the list of tracked combos.
>>
>>
>>      local saw = cat(c$id$orig_h,platform); #There is probably a less ugly
>> way to do this than cat, but it seems to work
>>      if ( platform != "Unknown OS" && saw !in seen_browser_platforms )
>>          {
>>          local rec: BrowserPlatform::Info = [$ts=network_time(),
>> $uid=c$uid, $host=c$id$orig_h, $platform_token=platform,
>> $unparsed_version=value];
>>          Log::write(BrowserPlatform::LOG, rec);
>>          add seen_browser_platforms[saw];
>>          }
>> }
>>
>> ======================== End Script ========================
>>
>> On 1/31/2014 10:56 PM, Gary Faulkner wrote:
>>
>>> Thanks for the suggestions, that cleans that bit up quite nicely. I
>>> actually started by trying to deconstruct the various software.bro
>>> scripts and work my way backwards through the framework to see what was
>>> doing what. I'm still trying to navigate my way through that code, but I
>>> agree that it would make more sense to leverage it directly than create
>>> a derivative just to pull out a specific bit of the data. I'm not
>>> currently running Splunk in any production sense, but that is pretty
>>> much what I'm trying to do in Bro. Thanks for sharing it!
>>>
>>> Regards,
>>> Gary
>>>
>>> On 1/31/2014 6:12 PM, Justin Azoff wrote:
>>>
>>>> On Wed, Jan 29, 2014 at 05:35:46PM -0600, Gary Faulkner wrote:
>>>>
>>>>> event http_header(c: connection, is_orig: bool, name: string, value:
>>>>> string)
>>>>> {
>>>>>        local platform = "Unknown OS";
>>>>>        if ( is_orig )
>>>>>            {
>>>>>          if ( name == "USER-AGENT" && /Windows NT 5.1/ in value )
>>>>>                  {
>>>>>                  platform = "Windows XP";
>>>>>                  }
>>>>>            else if ( name == "USER-AGENT" && /Windows NT 6.0/ in value )
>>>>>                    {
>>>>>                  platform = "Windows Vista";
>>>>>                    }
>>>>>            else if ( name == "USER-AGENT" && /Windows NT 6.1/ in value )
>>>>>                    {
>>>>>                    platform = "Windows 7";
>>>>>                    }
>>>>>
>>>> ..
>>>>
>>>> Modifying the http_header event handler as follows will increase
>>>> performance:
>>>>
>>>> event http_header(c: connection, is_orig: bool, name: string, value:
>>>> string)
>>>> {
>>>>        if(!is_orig || name != "USER-AGENT")
>>>>            return;
>>>>        if(/Windows NT 5.1/ in value)
>>>>            platform = "Windows XP";
>>>>        else if ...
>>>>
>>>> FWIW, I used to do this kind of thing outside of bro using splunk:
>>>>
>>>> https://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py
>>>>
>>>> One thing you may want to do is rather than use the http_header event
>>>> use
>>>>
>>>> event log_software(rec: Info)
>>>> {
>>>>        ...
>>>> }
>>>>
>>>> which will be raised every time a new software version is seen.  The
>>>> software framework is already pulling most of the info out that you
>>>> might need, so you can piggy back on the work that it is doing.
>>>>
>>>>
>>
>> _______________________________________________
>> Bro mailing list
>> bro at bro-ids.org
>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>>


-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 6257 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20140402/3c2d3e61/attachment.bin 


More information about the Bro mailing list