[Bro] Questions about Bro Capabilities

Wed Oct 3 08:26:19 PDT 2007

On Wed, Oct 03, 2007 at 11:14:45AM -0400, Reed Porada composed:
> 
> On Oct 3, 2007, at 10:36 AM, Nicholas Weaver wrote:
> 
> >
> >On Wed, Oct 03, 2007 at 10:04:33AM -0400, Reed Porada composed:
> >
> >>I am working on a Traffic Generator (TG) project.  Our TG has static
> >>content for webpages and fileshares.  In addition, we know when our
> >>TG hosts attempt to access that data.  Given those to things, I want
> >>to be able to take a network capture, run it through a system and
> >>separate out traffic that we know our TG generated, by correlating
> >>intent and traffic content, and other traffic on the network.  The
> >>end goal being smaller and more relevant network captures for an
> >>analyst.  In order to do this I want to try and leverage others
> >>protocol analyzers and parsers.  Bro seems to be a good choice as I
> >>believe through a policy and some pregenerated variables (based on
> >>the content and host intent) I can validate given traffic to be from
> >>our TG system, and leave the rest for others to analyze.  I believe
> >>that in order to do this I need to get out of Bro the relevant
> >>packets, either packet number or timestamp.  Given that information,
> >>I would be able to run it through a script that would split the pcap
> >>based on the output.  The added benefit of Bro is that it does some
> >>additional analysis that could be useful for capture analysis.
> >
> >What exactly are the defining characteristics of your synthetic  
> >traffic?
> 
> Our synthetic traffic is not any different than if a normal user was  
> on a machine generating the traffic.  Meaning that we use IE to  
> navigate to a page, and we use Windows File Browsing to look at  
> network file shares.  Our TG is designed to be run on an isolated  
> network, ala DETER, thus we setup a simulated internet, and other  
> simulated networks.  Since we are creating these networks, we control  
> server content, IP addresses, and host-names.  The belief that we  
> have is that since we know what our content is (i.e. what is at a  
> given website, or on a given file share) and we know when we tried to  
> access the given data (we have our host agents log intent), that we  
> can separate out our TG traffic.  In theory there is no defining  
> characteristic of our synthetic traffic in the packet captures that  
> we could make Bro or really any other packet analyzer look for,  
> basically we do not set the evil bit.  However, with the additional  
> knowledge of what the content is, and what a synthetic user was  
> doing, we believe we can find our traffic.  After looking at the  
> variables and other things that Bro policy language has, I believe I  
> can construct the lookup tables for host_agent_events and  
> web_content.  Therefore, I believe that I can create a policy script  
> to "find" our traffic.  What I am not sure is that from the policy I  
> can provide the information necessary to get our traffic out of the  
> capture, i.e make a smaller capture with just the non-TG traffic.

One thought:

For offline processing, do a two-pass approach.  In the first pass,
you use Bro to find the TG flows based on the higher-level attributes,
and write out the flow IDs.  For the second pass, only capture the
flows which don't correspond.

-- 
Nicholas C. Weaver                               nweaver at icsi.berkeley.edu
     This message has been ROT-13 encrypted twice for higher security.