From sstattla at gmail.com  Wed Oct  6 11:37:58 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Wed, 06 Oct 2010 11:37:58 -0700
Subject: [Bro] Understanding the event generation and handling
Message-ID: <4CACC206.2050505@gmail.com>

  Hi,

I've been looking at the Bro documentation and source code recently. I 
need to get into lower-level details and looking at Source code is not 
helping me.

Specifically, I need to get to the logic of-
1. Event generation: How does Bro know which all events to raise by 
looking at a particular packet? I have a basic understanding of the 
class hierarchy, but I don't know where to look for the code that 
decides which specific Application layer analyzer object to create by 
looking at the Application Layer header/signature of the incoming packet.

2. Event handling: It seems that an event's information is stored in an 
object and all events are queued in an Event Manager as they are 
created. After every packet is processed, this queue of events is 
drained (thus following a single-threaded model) and the events are sent 
to a Serializer. I found the serialization code hard to understand so I 
don't know the logic of how an event-handler (interpreter?) decides 
which event belongs to it. I'd really like to know the mechanism in here.

Can someone please suggest which debugger to use and how, so that I can 
step-by-step understand the event-engine?

Thank you,
Sunjeet Singh


From vern at icir.org  Wed Oct  6 17:00:08 2010
From: vern at icir.org (Vern Paxson)
Date: Wed, 06 Oct 2010 17:00:08 -0700
Subject: [Bro] Understanding the event generation and handling
In-Reply-To: <4CACC206.2050505@gmail.com> (Wed, 06 Oct 2010 11:37:58 PDT).
Message-ID: <20101007000008.63B3336A422@taffy.ICSI.Berkeley.EDU>

> Specifically, I need to get to the logic of-
> 1. Event generation: How does Bro know which all events to raise by 
> looking at a particular packet?

There is a tree of analyzers that's traversed (perhaps taking multiple
branches at any given point).

> I have a basic understanding of the 
> class hierarchy, but I don't know where to look for the code that 
> decides which specific Application layer analyzer object to create by 
> looking at the Application Layer header/signature of the incoming packet.

The architecture here is described in the paper:

	http://www.icir.org/robin/papers/usenix06.pdf

If you are looking for specific details regarding names of classes/methods,
etc., then you'll probably have to wait until Robin comes back from vacation
in a couple of weeks.

> 2. Event handling: It seems that an event's information is stored in an 
> object and all events are queued in an Event Manager as they are 
> created.

Correct.

> After every packet is processed, this queue of events is 
> drained (thus following a single-threaded model) and the events are sent 
> to a Serializer. I found the serialization code hard to understand so I 

Ignore the serializer.  It's there for things like communication between
multiple Bro processes.

> Can someone please suggest which debugger to use and how, so that I can 
> step-by-step understand the event-engine?

Well, I use gdb, and if I must, I start with invocations of
NetSessions::NextPacket .

If you want to sketch your particular goal, that might help with giving
you more focussed advice.

		Vern


From sstattla at gmail.com  Wed Oct  6 17:37:39 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Wed, 06 Oct 2010 17:37:39 -0700
Subject: [Bro] Understanding the event generation and handling
In-Reply-To: <20101007000008.63B3336A422@taffy.ICSI.Berkeley.EDU>
References: <20101007000008.63B3336A422@taffy.ICSI.Berkeley.EDU>
Message-ID: <4CAD1653.4010902@gmail.com>

  Hi Vern,

> The architecture here is described in the paper:
>
> 	http://www.icir.org/robin/papers/usenix06.pdf
>
Thanks! I'll take a look.

> Well, I use gdb, and if I must, I start with invocations of
> NetSessions::NextPacket .
>
This is helpful.

> If you want to sketch your particular goal, that might help with giving
> you more focussed advice.
>
I'm interested in Bro in general, but right now I'd be interested to 
know details about how event handling was implemented in Bro.
So for every event from the event queue, how many handlers is it matched 
against for the right handlers to be invoked? All?(Probably not)
Could you please shed some light on the details here? Do you think there 
could be scope for optimization?

Thank you,
Sunjeet Singh


From vern at icir.org  Wed Oct  6 17:42:15 2010
From: vern at icir.org (Vern Paxson)
Date: Wed, 06 Oct 2010 17:42:15 -0700
Subject: [Bro] Understanding the event generation and handling
In-Reply-To: <4CAD1653.4010902@gmail.com> (Wed, 06 Oct 2010 17:37:39 PDT).
Message-ID: <20101007004215.8E31F36A422@taffy.ICSI.Berkeley.EDU>

> So for every event from the event queue, how many handlers is it matched 
> against for the right handlers to be invoked?

There's no matching at all.  Rather, when policy scripts define new event
handlers, they're directly associated with the name of the event.  So when
the event engine generates event_XXX, there's already (scripting) code
associated with a global variable named event_XXX, and that's executed
directly.

> Do you think there 
> could be scope for optimization?

No.  Where optimization would prove fruitful (but hard) is for the script
interpreter.

		Vern


From sstattla at gmail.com  Thu Oct  7 10:47:06 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Thu, 07 Oct 2010 10:47:06 -0700
Subject: [Bro] Filtering based on port-number
Message-ID: <4CAE079A.5070505@gmail.com>

  Hi,

The Bro Analyzers operate on the principle that port number is not a 
good indicator of protocol. But the filtering step does exactly the 
opposite.

For example, the filter applied when the default brolite.bro policy file 
is used is-
((((((((((port telnet or tcp port 513) or (tcp[13] & 7 != 0)) or (tcp 
dst port 80 or tcp dst port 8080 or tcp dst port 8000)) or (tcp src port 
80 or tcp src port 8080 or tcp src port 8000)) or (port 111)) or 
((ip[6:2] & 0x3fff != 0) and tcp)) or (udp port 69)) or (port 6666)) or 
(tcp port smtp or tcp port 587)) or (port ftp)) or (port 6667)

Thanks to the filtering step,
1. Bro will analyze some traffic that didn't belong to any of the 
'relevant' protocols until it realizes that it can safely be discarded, and
2. Bro will not analyze traffic that belonged to one of the relevant 
protocols because it was filtered out for not being used on the standard 
port.

Is this true? And if so, is this an okay side-effect to have of the 
filtering step?

Thank you,
Sunjeet Singh


From redlamb19 at gmail.com  Thu Oct  7 12:41:00 2010
From: redlamb19 at gmail.com (Peter Erickson)
Date: Thu, 7 Oct 2010 14:41:00 -0500
Subject: [Bro] Filtering based on port-number
In-Reply-To: <4CAE079A.5070505@gmail.com>
References: <4CAE079A.5070505@gmail.com>
Message-ID: <20101007194059.GB4798@does.not.exist>

I thought the same thing when I first started looking at Bro and it's
dynamic protocol detection (dpd) about 2 months ago. Take a look at the
dpd wiki page which gives a good description of how it works. It also
states:

when loading dpd you may need to change the filter to include all
packets, e.g. on the command line:
bro -f "tcp or udp or icmp" ...

** Sunjeet Singh <sstattla at gmail.com> [2010-10-07 10:47:06 -0700] **
>   Hi,
> 
> The Bro Analyzers operate on the principle that port number is not a 
> good indicator of protocol. But the filtering step does exactly the 
> opposite.
> 
> For example, the filter applied when the default brolite.bro policy file 
> is used is-
> ((((((((((port telnet or tcp port 513) or (tcp[13] & 7 != 0)) or (tcp 
> dst port 80 or tcp dst port 8080 or tcp dst port 8000)) or (tcp src port 
> 80 or tcp src port 8080 or tcp src port 8000)) or (port 111)) or 
> ((ip[6:2] & 0x3fff != 0) and tcp)) or (udp port 69)) or (port 6666)) or 
> (tcp port smtp or tcp port 587)) or (port ftp)) or (port 6667)
> 
> Thanks to the filtering step,
> 1. Bro will analyze some traffic that didn't belong to any of the 
> 'relevant' protocols until it realizes that it can safely be discarded, and
> 2. Bro will not analyze traffic that belonged to one of the relevant 
> protocols because it was filtered out for not being used on the standard 
> port.
> 
> Is this true? And if so, is this an okay side-effect to have of the 
> filtering step?
> 


From seth at icir.org  Thu Oct  7 12:48:12 2010
From: seth at icir.org (Seth Hall)
Date: Thu, 7 Oct 2010 15:48:12 -0400
Subject: [Bro] Filtering based on port-number
In-Reply-To: <20101007194059.GB4798@does.not.exist>
References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist>
Message-ID: <DA7433DF-0356-4033-8394-EC2FCE97EBA8@icir.org>


On Oct 7, 2010, at 3:41 PM, Peter Erickson wrote:

> when loading dpd you may need to change the filter to include all
> packets, e.g. on the command line:
> bro -f "tcp or udp or icmp" ...

You can also change the filter at the script level list this..
    redef capture_filters += { ["all-ip-traffic"] = "ip" };

  .Seth


From sstattla at gmail.com  Thu Oct  7 14:33:53 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Thu, 07 Oct 2010 14:33:53 -0700
Subject: [Bro] Filtering based on port-number
In-Reply-To: <20101007194059.GB4798@does.not.exist>
References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist>
Message-ID: <4CAE3CC1.5060007@gmail.com>


> when loading dpd you may need to change the filter to include all
> packets, e.g. on the command line:
> bro -f "tcp or udp or icmp" ...
>
Okay, so it makes sense to use capture_filter as-it-is when you are not 
using DPD; and to disable capture_filter (using "bro -f") if you are 
using DPD. In the latter case, you end up analyzing all packets which 
causes an extra performance cost of about 13.8% [with given parameters, 
Section 6.1, USENIX'06 paper].

The same section of the paper also says that the runtime of the Bro 
system exceeds the duration of the trace, indicating that we require 
"multiple NIDS instances in live operation".

"Multiple NIDS instances in live operation"- has this been discussed 
anywhere else? With the filter disabled, this would be very useful. Is 
it as simple as splitting up your policy file among different machines 
running Bro or is there more to it?

Thank you, Peter.

Sunjeet Singh


From redlamb19 at gmail.com  Fri Oct  8 07:59:38 2010
From: redlamb19 at gmail.com (Peter Erickson)
Date: Fri,  8 Oct 2010 09:59:38 -0500
Subject: [Bro] Filtering based on port-number
In-Reply-To: <4CAE3CC1.5060007@gmail.com>
References: <4CAE079A.5070505@gmail.com>
	<20101007194059.GB4798@does.not.exist> <4CAE3CC1.5060007@gmail.com>
Message-ID: <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net>


>> when loading dpd you may need to change the filter to include all
>> packets, e.g. on the command line:
>> bro -f "tcp or udp or icmp" ...
>>
> Okay, so it makes sense to use capture_filter as-it-is when you are not
> using DPD; and to disable capture_filter (using "bro -f") if you are
> using DPD. In the latter case, you end up analyzing all packets which
> causes an extra performance cost of about 13.8% [with given parameters,
> Section 6.1, USENIX'06 paper].
>
> The same section of the paper also says that the runtime of the Bro
> system exceeds the duration of the trace, indicating that we require
> "multiple NIDS instances in live operation".
>
> "Multiple NIDS instances in live operation"- has this been discussed
> anywhere else? With the filter disabled, this would be very useful. Is
> it as simple as splitting up your policy file among different machines
> running Bro or is there more to it?

Someone else can correct me if I'm wrong, but I think that you are  
needing to setup a clustered environment with managers, proxies, and  
workers. The user manual briefly mentions something about this in the  
installation section, but my limited understanding of how it works  
comes from reading the scripts located at $BROHOME/share/broctl. My  
use of bro is strictly for offline processing so I have yet to really  
pay attention to it other than starting bro in standalone mode.


From sstattla at gmail.com  Fri Oct  8 08:08:24 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Fri, 08 Oct 2010 08:08:24 -0700
Subject: [Bro] Filtering based on port-number
In-Reply-To: <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net>
References: <4CAE079A.5070505@gmail.com>	<20101007194059.GB4798@does.not.exist>
	<4CAE3CC1.5060007@gmail.com>
	<20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net>
Message-ID: <4CAF33E8.2040405@gmail.com>

  I'm looking into it. Thanks for your help Peter.

Sunjeet Singh


On 10-10-08 07:59 AM, Peter Erickson wrote:
>
>>> when loading dpd you may need to change the filter to include all
>>> packets, e.g. on the command line:
>>> bro -f "tcp or udp or icmp" ...
>>>
>> Okay, so it makes sense to use capture_filter as-it-is when you are not
>> using DPD; and to disable capture_filter (using "bro -f") if you are
>> using DPD. In the latter case, you end up analyzing all packets which
>> causes an extra performance cost of about 13.8% [with given parameters,
>> Section 6.1, USENIX'06 paper].
>>
>> The same section of the paper also says that the runtime of the Bro
>> system exceeds the duration of the trace, indicating that we require
>> "multiple NIDS instances in live operation".
>>
>> "Multiple NIDS instances in live operation"- has this been discussed
>> anywhere else? With the filter disabled, this would be very useful. Is
>> it as simple as splitting up your policy file among different machines
>> running Bro or is there more to it?
>
> Someone else can correct me if I'm wrong, but I think that you are 
> needing to setup a clustered environment with managers, proxies, and 
> workers. The user manual briefly mentions something about this in the 
> installation section, but my limited understanding of how it works 
> comes from reading the scripts located at $BROHOME/share/broctl. My 
> use of bro is strictly for offline processing so I have yet to really 
> pay attention to it other than starting bro in standalone mode.


From seth at icir.org  Fri Oct  8 08:33:40 2010
From: seth at icir.org (Seth Hall)
Date: Fri, 8 Oct 2010 11:33:40 -0400
Subject: [Bro] Filtering based on port-number
In-Reply-To: <4CAF33E8.2040405@gmail.com>
References: <4CAE079A.5070505@gmail.com>	<20101007194059.GB4798@does.not.exist>
	<4CAE3CC1.5060007@gmail.com>
	<20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net>
	<4CAF33E8.2040405@gmail.com>
Message-ID: <0B6A27B4-26BA-4180-BF07-050855AF0553@icir.org>

The best documentation for this can currently be found here:

    http://www.icir.org/robin/bro-cluster/

  .Seth

On Oct 8, 2010, at 11:08 AM, Sunjeet Singh wrote:

>  I'm looking into it. Thanks for your help Peter.
> 
> Sunjeet Singh
> 
> 
> On 10-10-08 07:59 AM, Peter Erickson wrote:
>> 
>>>> when loading dpd you may need to change the filter to include all
>>>> packets, e.g. on the command line:
>>>> bro -f "tcp or udp or icmp" ...
>>>> 
>>> Okay, so it makes sense to use capture_filter as-it-is when you are not
>>> using DPD; and to disable capture_filter (using "bro -f") if you are
>>> using DPD. In the latter case, you end up analyzing all packets which
>>> causes an extra performance cost of about 13.8% [with given parameters,
>>> Section 6.1, USENIX'06 paper].
>>> 
>>> The same section of the paper also says that the runtime of the Bro
>>> system exceeds the duration of the trace, indicating that we require
>>> "multiple NIDS instances in live operation".
>>> 
>>> "Multiple NIDS instances in live operation"- has this been discussed
>>> anywhere else? With the filter disabled, this would be very useful. Is
>>> it as simple as splitting up your policy file among different machines
>>> running Bro or is there more to it?
>> 
>> Someone else can correct me if I'm wrong, but I think that you are 
>> needing to setup a clustered environment with managers, proxies, and 
>> workers. The user manual briefly mentions something about this in the 
>> installation section, but my limited understanding of how it works 
>> comes from reading the scripts located at $BROHOME/share/broctl. My 
>> use of bro is strictly for offline processing so I have yet to really 
>> pay attention to it other than starting bro in standalone mode.
> 
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro


From sstattla at gmail.com  Fri Oct  8 08:37:15 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Fri, 08 Oct 2010 08:37:15 -0700
Subject: [Bro] Filtering based on port-number
In-Reply-To: <0B6A27B4-26BA-4180-BF07-050855AF0553@icir.org>
References: <4CAE079A.5070505@gmail.com>	<20101007194059.GB4798@does.not.exist>
	<4CAE3CC1.5060007@gmail.com>
	<20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net>
	<4CAF33E8.2040405@gmail.com>
	<0B6A27B4-26BA-4180-BF07-050855AF0553@icir.org>
Message-ID: <4CAF3AAB.9070501@gmail.com>

  Got it! Thanks Seth,

Sunjeet Singh


On 10-10-08 08:33 AM, Seth Hall wrote:
> The best documentation for this can currently be found here:
>
>      http://www.icir.org/robin/bro-cluster/
>
>    .Seth
>
> On Oct 8, 2010, at 11:08 AM, Sunjeet Singh wrote:
>
>>   I'm looking into it. Thanks for your help Peter.
>>
>> Sunjeet Singh
>>
>>
>> On 10-10-08 07:59 AM, Peter Erickson wrote:
>>>>> when loading dpd you may need to change the filter to include all
>>>>> packets, e.g. on the command line:
>>>>> bro -f "tcp or udp or icmp" ...
>>>>>
>>>> Okay, so it makes sense to use capture_filter as-it-is when you are not
>>>> using DPD; and to disable capture_filter (using "bro -f") if you are
>>>> using DPD. In the latter case, you end up analyzing all packets which
>>>> causes an extra performance cost of about 13.8% [with given parameters,
>>>> Section 6.1, USENIX'06 paper].
>>>>
>>>> The same section of the paper also says that the runtime of the Bro
>>>> system exceeds the duration of the trace, indicating that we require
>>>> "multiple NIDS instances in live operation".
>>>>
>>>> "Multiple NIDS instances in live operation"- has this been discussed
>>>> anywhere else? With the filter disabled, this would be very useful. Is
>>>> it as simple as splitting up your policy file among different machines
>>>> running Bro or is there more to it?
>>> Someone else can correct me if I'm wrong, but I think that you are
>>> needing to setup a clustered environment with managers, proxies, and
>>> workers. The user manual briefly mentions something about this in the
>>> installation section, but my limited understanding of how it works
>>> comes from reading the scripts located at $BROHOME/share/broctl. My
>>> use of bro is strictly for offline processing so I have yet to really
>>> pay attention to it other than starting bro in standalone mode.
>> _______________________________________________
>> Bro mailing list
>> bro at bro-ids.org
>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro


From sstattla at gmail.com  Fri Oct  8 16:39:22 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Fri, 08 Oct 2010 16:39:22 -0700
Subject: [Bro] Multi-threading
Message-ID: <4CAFABAA.30105@gmail.com>

  Hi,

Can someone please comment on the current status of multi-threading in 
Bro? I would be interested in doing some work here.

I've been reading a bit about it at-

http://blog.securitymonks.com/2010/08/26/three-little-idsips-engines-build-their-open-source-solutions/
and
http://www.google.ca/url?sa=t&source=web&cd=1&ved=0CBQQFjAA&url=http%3A%2F%2Fwww.bro-ids.org%2Fbro-workshop-2009%2Fslides%2FFutureWork.pdf&rct=j&q=bro%20ids%20multithreading&ei=gquvTK6KIIa-sAPT8viQDA&usg=AFQjCNGhsZ76_FKTpe3P-v40RgT1Ye36KA&sig2=y8oAyNcZ602kjuT1Ei2ytw&cad=rja

Thank you,
Sunjeet Singh


From vern at icir.org  Mon Oct 11 20:43:00 2010
From: vern at icir.org (Vern Paxson)
Date: Mon, 11 Oct 2010 20:43:00 -0700
Subject: [Bro] Multi-threading
In-Reply-To: <4CAFABAA.30105@gmail.com> (Fri, 08 Oct 2010 16:39:22 PDT).
Message-ID: <20101012034300.B232436A421@taffy.ICSI.Berkeley.EDU>

> Can someone please comment on the current status of multi-threading in 
> Bro?

That will need to be Robin, as he's the one who's done all the work on
this.  However, he's on vacation for another week, and will no doubt face
a major email backlog when he returns.

		Vern


From zsmountain27 at gmail.com  Tue Oct 12 10:55:32 2010
From: zsmountain27 at gmail.com (SONG ZHAO)
Date: Tue, 12 Oct 2010 13:55:32 -0400
Subject: [Bro] Modify mac address
Message-ID: <AANLkTimW==GC0wORM5E8+XB00OWFXrN9dQqeLZ73AySR@mail.gmail.com>

Hi,
I want to modify the mac address of network packets before or after the
packets handled by bro.
Could you tell me how to modify the mac address using bro? Do I need to
revise the source code?

Thanks,
Song
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20101012/828dbd22/attachment.html 

From christian at icir.org  Tue Oct 12 12:43:41 2010
From: christian at icir.org (Christian Kreibich)
Date: Tue, 12 Oct 2010 12:43:41 -0700
Subject: [Bro] Modify mac address
In-Reply-To: <AANLkTimW==GC0wORM5E8+XB00OWFXrN9dQqeLZ73AySR@mail.gmail.com>
References: <AANLkTimW==GC0wORM5E8+XB00OWFXrN9dQqeLZ73AySR@mail.gmail.com>
Message-ID: <1286912621.1919.158.camel@strangepork>

On Tue, 2010-10-12 at 13:55 -0400, SONG ZHAO wrote:
> Hi,
> I want to modify the mac address of network packets before or after
> the packets handled by bro.
> Could you tell me how to modify the mac address using bro? Do I need
> to revise the source code?

You would likely have to revise source code, but without more context
it's unclear whether Bro is a good choice for what you want to do. If
all you want is Ethernet address rewriting, there are other tools that
likely already do what you want. tcprewrite provides basic Ethernet
address rewriting. For more flexibility, you could write a little Scapy
script as shown below. As a last resort you could write a Netdude plugin
that does what you need.

map = {'00:50:da:53:8a:01': '11:22:33:44:55:66',
       '00:12:7f:eb:3b:cf': '77:88:99:aa:bb:cc'}

for pkt in rdpcap('in.trace'):
    for key, val in map.items():
        if pkt.src == key:
            pkt.src = val
        if pkt.dst == key:
            pkt.dst = val

wrpcap('out.trace', pkts)

-- 
Cheers,
Christian


From redlamb19 at gmail.com  Tue Oct 12 16:38:18 2010
From: redlamb19 at gmail.com (Peter Erickson)
Date: Tue, 12 Oct 2010 18:38:18 -0500
Subject: [Bro] http analyzer and de-obfuscating the payload
Message-ID: <20101012233818.GA1484@does.not.exist>

While writing a few policies to track an extremely basic malware
"protocol" that sits on top of HTTP, I ran into a few questions that I
haven't been able to find answers for.

1. Are binpac analyzers preferred over the hand-written one? From what
I can tell, which may be wrong, the http binpac analyzer does not send a
http_entity_data event so using http-extract-items is not possible. Is
it possible to extract http items using the binpac analyzer or am I
better off sticking with the hand-written one?

2. When processing events, i.e. http_message_done, is it possible to
access the entire assembled stream without writing it to disk first? I
have some malware traffic that I would like to analyze with bro, but the
data is obfuscated within the http data section using layers of xor,
compression, and encryption techniques. Ideally, I would use bro to
de-obfuscate the streams and provide additional info in the log files
instead of using python scripts after running bro.  I have no problems
writing the bifs (I've already created an xor one), but want to make
sure the info is available if I do write them. 

3. Along the same lines as #2, is the assembled stream available for
connections that are not http?

Any help is appreciated. Thanks in advance.


From seth at icir.org  Tue Oct 12 19:17:36 2010
From: seth at icir.org (Seth Hall)
Date: Tue, 12 Oct 2010 22:17:36 -0400
Subject: [Bro] http analyzer and de-obfuscating the payload
In-Reply-To: <20101012233818.GA1484@does.not.exist>
References: <20101012233818.GA1484@does.not.exist>
Message-ID: <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org>


On Oct 12, 2010, at 7:38 PM, Peter Erickson wrote:
> Is it possible to extract http items using the binpac analyzer or am I
> better off sticking with the hand-written one?

Binpac analyzers are preferred when writing new analyzers, but some of the binpac analyzers are not at feature parity with their handwritten counterparts (HTTP is the primary problem in this regard).  For now, I recommend not using the --enable-binpac flag when doing HTTP analysis.

> 2. When processing events, i.e. http_message_done, is it possible to
> access the entire assembled stream without writing it to disk first?

No.  Generally when doing stream analysis with Bro you have two options.  The best, if your analysis method allows it is to do the analysis in a streaming fashion with chunks of data as they become available.  If your analysis method needs random access to the data, then you are probably best off writing to disk and kicking off an external process (from within Bro) once the stream is completed and the file is closed.  The output of that analysis could then feed back into Bro using Broccoli.  

You typically don't want to try storing large streams in memory because it would be far too easy to use all available memory and crash Bro.  Of course, if you are running Bro on tracefiles instead of live network interfaces that may not be a concern.

> 3. Along the same lines as #2, is the assembled stream available for
> connections that are not http?

It depends on the protocol and the analyzer.  If you search through the event.bif.bro file for "_data", that will point out analyzer events which likely are sending a stream of data.  The analyzers which currently have _data events are: HTTP, SMTP, POP3, and MIME.  Unfortunately some of the other obvious ones like SMB and NFS don't currently have _data events.  We accept patches though if you'd like to add support for that. :)

Is there a protocol or set of protocols in particular that you'd like to see supported with _data events?

  .Seth


From redlamb19 at gmail.com  Tue Oct 12 20:30:06 2010
From: redlamb19 at gmail.com (Peter Erickson)
Date: Tue, 12 Oct 2010 22:30:06 -0500
Subject: [Bro] http analyzer and de-obfuscating the payload
In-Reply-To: <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org>
References: <20101012233818.GA1484@does.not.exist>
	<89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org>
Message-ID: <20101012223006.1qberfceecow4osc@imp.redlamb.net>

On Tue Oct 12 21:17:36 2010, Seth Hall <seth at icir.org> wrote:
>> 2. When processing events, i.e. http_message_done, is it possible to
>> access the entire assembled stream without writing it to disk first?
>
> No.  Generally when doing stream analysis with Bro you have two   
> options.  The best, if your analysis method allows it is to do the   
> analysis in a streaming fashion with chunks of data as they become   
> available.  If your analysis method needs random access to the data,  
>  then you are probably best off writing to disk and kicking off an   
> external process (from within Bro) once the stream is completed and   
> the file is closed.  The output of that analysis could then feed   
> back into Bro using Broccoli.

I didn't think of using broccoli to feed it back into the system. I'll  
have to reconsider my current setup to see if that makes sense. It  
works now without it, but there is definitely a benefit of having  
additional information within bro's log files.

> You typically don't want to try storing large streams in memory   
> because it would be far too easy to use all available memory and   
> crash Bro.  Of course, if you are running Bro on tracefiles instead   
> of live network interfaces that may not be a concern.

All the analysis that I have been (and will be doing) is with  
tracefiles on a machine that is not connected to a network. I figured  
that there were chances that I could run out of memory, but was hoping  
that the memory would be released once the connection was terminated.  
I did not think about using a table of strings to keep the data...  
guess I was thinking too deep.

>> 3. Along the same lines as #2, is the assembled stream available for
>> connections that are not http?
>
> It depends on the protocol and the analyzer.  If you search through   
> the event.bif.bro file for "_data", that will point out analyzer   
> events which likely are sending a stream of data.  The analyzers   
> which currently have _data events are: HTTP, SMTP, POP3, and MIME.    
> Unfortunately some of the other obvious ones like SMB and NFS don't   
> currently have _data events.  We accept patches though if you'd like  
>  to add support for that. :)

I figured that you would accept patches. It has been awhile since I've  
used C++, but hoping it will come back to me. I have spent a lot of  
time looking at the source code to better understand how bro works. I  
would love to see RDP and SSL decryption, but I know that those aren't  
easy tasks... doesnt mean I wont try eventually.

> Is there a protocol or set of protocols in particular that you'd   
> like to see supported with _data events?

I haven't seen anything yet, but I'm sure that I'll come across  
something eventually.

Thanks for all the help.


From seth at icir.org  Wed Oct 13 07:06:32 2010
From: seth at icir.org (Seth Hall)
Date: Wed, 13 Oct 2010 10:06:32 -0400
Subject: [Bro] http analyzer and de-obfuscating the payload
In-Reply-To: <20101012223006.1qberfceecow4osc@imp.redlamb.net>
References: <20101012233818.GA1484@does.not.exist>
	<89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org>
	<20101012223006.1qberfceecow4osc@imp.redlamb.net>
Message-ID: <84A19F9A-6723-49FD-B345-148E5E9CFB34@icir.org>


On Oct 12, 2010, at 11:30 PM, Peter Erickson wrote:
> I didn't think of using broccoli to feed it back into the system. I'll have to reconsider my current setup to see if that makes sense. It works now without it, but there is definitely a benefit of having additional information within bro's log files.

It's especially useful when you're using Bro on live network because the information gained from the external analysis could feed back into Bro to change it's behavior if the same thing is seen again.  As a personal exercise, I'm going to start including concrete examples when I talk about techniques in Bro. :)  So, here's my concrete example...

Bro identifies a Windows executable being downloaded over HTTP so it begins calculating an MD5 sum of the bytes being transferred.  It could also save the file to disk.  When the file is done being transferred, the on-disk filename could be sent off to an external process which grabs the file does something like run it through VirusTotal and returns the result of that scan to Bro.  If the file is determined to be malicious an alarm could be raised about the initial transfer and the MD5 sum could be added to a set of malicious MD5 sums.  The URL of the file could also be added to a set of URLs.  In the future, if any host downloads a file with that MD5 sum or from the same URL then an alarm would automatically be raised without waiting for the external analysis to take place.  This full scenario is not currently implemented in Bro, but things are lining up to make this sort of analysis possible.

If you have ideas for analysis scenarios that you'd like to see implemented, I'd really like to hear them!

> All the analysis that I have been (and will be doing) is with tracefiles on a machine that is not connected to a network. I figured that there were chances that I could run out of memory, but was hoping that the memory would be released once the connection was terminated. I did not think about using a table of strings to keep the data... guess I was thinking too deep.

You could either keep a table of strings or concatenate the strings together as new data comes in.  I'll include some examples here.

Using these inputs...
global a = "first string";
global b = "second string";
global output = "";

You can do this...
global stuff: string_array = table();
stuff[|stuff|+1] = a;
stuff[|stuff|+1] = b;
output = cat_string_array(stuff);

Or this...
output = string_cat(a, b);

> I figured that you would accept patches. It has been awhile since I've used C++, but hoping it will come back to me. I have spent a lot of time looking at the source code to better understand how bro works. I would love to see RDP and SSL decryption, but I know that those aren't easy tasks... doesnt mean I wont try eventually.

Bro currently doesn't have any support for RDP but I think that a lot of the support for SSL decryption is already in place.  I've haven't ever done it though so I don't know if it is completely there and working though.

  .Seth


From vern at icir.org  Wed Oct 13 12:59:50 2010
From: vern at icir.org (Vern Paxson)
Date: Wed, 13 Oct 2010 12:59:50 -0700
Subject: [Bro] http analyzer and de-obfuscating the payload
In-Reply-To: <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org> (Tue,
	12 Oct 2010 22:17:36 EDT).
Message-ID: <20101013195950.10ACC36A413@taffy.ICSI.Berkeley.EDU>

> > 3. Along the same lines as #2, is the assembled stream available for
> > connections that are not http?
> 
> It depends on the protocol and the analyzer.

Note, there are also generic tcp_contents() and udp_contents() events.
They likewise return the stream piecemeal.

		Vern


From vern at icir.org  Wed Oct 13 13:01:20 2010
From: vern at icir.org (Vern Paxson)
Date: Wed, 13 Oct 2010 13:01:20 -0700
Subject: [Bro] http analyzer and de-obfuscating the payload
In-Reply-To: <84A19F9A-6723-49FD-B345-148E5E9CFB34@icir.org> (Wed,
	13 Oct 2010 10:06:32 EDT).
Message-ID: <20101013200120.339D236A429@taffy.ICSI.Berkeley.EDU>

> Or this...
> output = string_cat(a, b);

One caveat is that the string_cat approach is essentially O(N^2) in the
size of the reassembled stream, because it winds up repeatedly copying the
entire string.  Ideally we'd fix this under the hood, one fine day ...

		Vern


From sstattla at gmail.com  Mon Oct 18 10:31:13 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Mon, 18 Oct 2010 10:31:13 -0700
Subject: [Bro] Use of GPUs for signature matching?
Message-ID: <4CBC8461.8010008@gmail.com>

  Bro currently follows a single-threaded model in which every incoming 
packet is first filtered, analyzed for protocol based on its signature 
(and not simply port-number) and then handled according to a 
user-defined policy for that protocol. While Bro provides mechanisms to 
distribute the processing of the handled policy events, the protocol 
analysis poses a performance bottleneck in that it might not be able to 
keep up with the speed of incoming packets.

In Bro's signature matching engine, connections sometimes trigger more 
than one signature and so can not be immediately associated with a 
protocol. But as more connection packets arrive, a better decision about 
the protocol involved can be made. During this process, different 
protocol analyzers may be spawned and killed until finally the right 
protocol is arrived at. Regular expression matching is done here to 
match signatures.

I believe that GPUs can be used here to perform parallel signature 
matching by different protocol analyzers, thus speeding up the protocol 
analysis phase. With this, Bro would be able to operate at a higher 
packet rate than it does now.

If this is true, I would like to do this. I will appreciate if you could 
share your thoughts.

Snort's packet processing throughput increased by 60% with the use of 
GPUs ( http://www.springerlink.com/content/b3m7662014272t8m/ ) and 
Suricata has plans to introduce GPUs ( 
http://blog.securitymonks.com/2010/08/26/three-little-idsips-engines-build-their- 
open-source-solutions/ ).


Thank you,
Sunjeet Singh


From vallentin at icir.org  Mon Oct 18 11:05:09 2010
From: vallentin at icir.org (Matthias Vallentin)
Date: Mon, 18 Oct 2010 11:05:09 -0700
Subject: [Bro] Log rotation and /dev/null with broctl
Message-ID: <20101018180509.GE403@icsi.berkeley.edu>

I receive some unexplainable errors using broctl:

19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory
19 Oct 04:42:55 [output] 1287253800.000380 run-time error: rotate_file: can't move /dev/null to /dev/null.3123.1287253800.000380.tmp: File exists
19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory
19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-17//dev/null.00:00:00-00:00:00.gz: No such file or directory
19 Oct 04:42:55 [output] 1287340200.000090 run-time error: rotate_file: can't move /dev/null to /dev/null.3123.1287340200.000090.tmp: File exists
19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory

My broctl.cfg is pretty standard, with the only big difference being the change
of the log directory:

   LogDir = /home2/bro-logs

This is also weird:

    % file /dev/null
    /dev/null: ASCII text
    % more /dev/null
    title

It almost seems that broctl overwrote /dev/null. Does that make any
sense?

   Matthias


From jmellander at lbl.gov  Mon Oct 18 11:15:06 2010
From: jmellander at lbl.gov (Jim Mellander)
Date: Mon, 18 Oct 2010 11:15:06 -0700
Subject: [Bro] Log rotation and /dev/null with broctl
In-Reply-To: <20101018180509.GE403@icsi.berkeley.edu>
References: <20101018180509.GE403@icsi.berkeley.edu>
Message-ID: <AANLkTi=9jpdh-0DDwiEee4tGAitH8mjpGk0iUxqbi7_t@mail.gmail.com>

On Mon, Oct 18, 2010 at 11:05 AM, Matthias Vallentin <vallentin at icir.org>wrote:

> <snip>
>


>
> This is also weird:
>
>    % file /dev/null
>    /dev/null: ASCII text
>    % more /dev/null
>    title
>
> It almost seems that broctl overwrote /dev/null. Does that make any
> sense?
>
> seen this happen when redirection goes bad:

Instead of

rm my_file >/dev/null

the redirection is accidentally missed:

rm my_file /dev/null
(obviously only works with privs in /dev)

then the next process redirecting to  /dev/null creates a text file.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20101018/0192e7ad/attachment.html 

From JAzoff at uamail.albany.edu  Mon Oct 18 11:25:40 2010
From: JAzoff at uamail.albany.edu (Justin Azoff)
Date: Mon, 18 Oct 2010 14:25:40 -0400
Subject: [Bro] Log rotation and /dev/null with broctl
In-Reply-To: <20101018180509.GE403@icsi.berkeley.edu>
References: <20101018180509.GE403@icsi.berkeley.edu>
Message-ID: <20101018182540.GG4105@datacomm.albany.edu>

On Mon, Oct 18, 2010 at 02:05:09PM -0400, Matthias Vallentin wrote:
> I receive some unexplainable errors using broctl:
> 
> 19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory

Do you have open_log_file("/dev/null") somewhere in one of your policy
scripts?  I don't think that sort of thing works, instead you need to
immediately close a file after opening it...

-- 
-- Justin Azoff
-- Network Security & Performance Analyst


From vallentin at icir.org  Mon Oct 18 11:40:29 2010
From: vallentin at icir.org (Matthias Vallentin)
Date: Mon, 18 Oct 2010 11:40:29 -0700
Subject: [Bro] Log rotation and /dev/null with broctl
In-Reply-To: <20101018182540.GG4105@datacomm.albany.edu>
References: <20101018180509.GE403@icsi.berkeley.edu>
	<20101018182540.GG4105@datacomm.albany.edu>
Message-ID: <20101018184029.GG403@icsi.berkeley.edu>

> Do you have open_log_file("/dev/null") somewhere in one of your policy
> scripts?  

Indeed, I could find the following

    # Save us some disk I/O.
    redef notice_file = open("/dev/null");
    redef bro_alarm_file = open("/dev/null");
    redef Weird::weird_file = open("/dev/null");

which I replaced with

    event bro_init()
    {   
        close(notice_file);
        close(bro_alarm_file);
        close(Weird::weird_file);
    }

to get rid of the error. Thanks for the hint.

   Matthias


From robin at icir.org  Mon Oct 18 12:12:47 2010
From: robin at icir.org (Robin Sommer)
Date: Mon, 18 Oct 2010 12:12:47 -0700
Subject: [Bro] Log rotation and /dev/null with broctl
In-Reply-To: <20101018184029.GG403@icsi.berkeley.edu>
References: <20101018180509.GE403@icsi.berkeley.edu>
	<20101018182540.GG4105@datacomm.albany.edu>
	<20101018184029.GG403@icsi.berkeley.edu>
Message-ID: <20101018191247.GT55971@icir.org>


On Mon, Oct 18, 2010 at 11:40 -0700, Matthias Vallentin wrote:

> to get rid of the error. Thanks for the hint.

We should check for that. Can you file a ticket to remember it?

Thanks,

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


From vallentin at icir.org  Mon Oct 18 12:25:09 2010
From: vallentin at icir.org (Matthias Vallentin)
Date: Mon, 18 Oct 2010 12:25:09 -0700
Subject: [Bro] Log rotation and /dev/null with broctl
In-Reply-To: <20101018191247.GT55971@icir.org>
References: <20101018180509.GE403@icsi.berkeley.edu>
	<20101018182540.GG4105@datacomm.albany.edu>
	<20101018184029.GG403@icsi.berkeley.edu>
	<20101018191247.GT55971@icir.org>
Message-ID: <20101018192509.GH403@icsi.berkeley.edu>

> We should check for that. Can you file a ticket to remember it?

Done.

   Matthias


From seth at icir.org  Mon Oct 18 12:26:23 2010
From: seth at icir.org (Seth Hall)
Date: Mon, 18 Oct 2010 15:26:23 -0400
Subject: [Bro] Log rotation and /dev/null with broctl
In-Reply-To: <20101018191247.GT55971@icir.org>
References: <20101018180509.GE403@icsi.berkeley.edu>
	<20101018182540.GG4105@datacomm.albany.edu>
	<20101018184029.GG403@icsi.berkeley.edu>
	<20101018191247.GT55971@icir.org>
Message-ID: <204463F0-F51C-4AFF-8B6F-5A19A9BD8FD8@icir.org>


On Oct 18, 2010, at 3:12 PM, Robin Sommer wrote:

> On Mon, Oct 18, 2010 at 11:40 -0700, Matthias Vallentin wrote:
> 
>> to get rid of the error. Thanks for the hint.
> 
> We should check for that. Can you file a ticket to remember it?

It would be good to have some good clarification on how *not* to print to log files.  I've been doing the close() trick in my logging framework for a long time but you and Vern both agreed that using close() probably isn't the right way to do it.  It works really well in this situation though because it does prevent remote printing as well as local printing.

  .Seth


From robin at icir.org  Mon Oct 18 12:33:06 2010
From: robin at icir.org (Robin Sommer)
Date: Mon, 18 Oct 2010 12:33:06 -0700
Subject: [Bro] Endace support in use?
Message-ID: <20101018193306.GA71746@icir.org>

Bro currently comes with native support for Endace cards (i.e.,
using the Endace API directly, not via their libpcap-compatible
interface). 

The support is enabled by configuring with --with-dag. As we're
cleaning up the Bro distribution, we were wondering if anybody is
using this functionality and would object seeing it removed?

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


From robin at icir.org  Wed Oct 20 14:12:11 2010
From: robin at icir.org (Robin Sommer)
Date: Wed, 20 Oct 2010 14:12:11 -0700
Subject: [Bro] Multi-threading
In-Reply-To: <4CAFABAA.30105@gmail.com>
References: <4CAFABAA.30105@gmail.com>
Message-ID: <20101020211211.GC68831@icir.org>

Sorry for the delay. 

On Fri, Oct 08, 2010 at 16:39 -0700, Sunjeet Singh wrote:

> Can someone please comment on the current status of multi-threading in 
> Bro? I would be interested in doing some work here.

We have a proof-of-concept implementation of a multi-threaded Bro.
Even though still an early prototype, it already improves Bro's
performance quite a bit on multi-core systems and demonstrates that
the approach works quite well. However, this prototype still has a
number of limitations and is not yet usable from an operational
perspective. There are also a number of different routes we could go
from here, which aren't fully clear yet in their specifics.

For more background, the most current description of the prototype
is here:

     http://www.icir.org/robin/papers/cc-multi-core-icast.pdf
     
Section V. describes the parallelization approach, and Section VI.
presents some preliminary measurements. (Section I-IV are on a more
conceptual level; not all of that is directly reflected in Bro).

A limiting factor for moving this forward right now is available
time, so help and contributions would certainly be welcome. Is there
anything specific you're thinking about? (I saw your mail about
GPUs, will reply to that in a bit). 

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


From robin at icir.org  Wed Oct 20 14:20:45 2010
From: robin at icir.org (Robin Sommer)
Date: Wed, 20 Oct 2010 14:20:45 -0700
Subject: [Bro] Use of GPUs for signature matching?
In-Reply-To: <4CBC8461.8010008@gmail.com>
References: <4CBC8461.8010008@gmail.com>
Message-ID: <20101020212045.GD68831@icir.org>


On Mon, Oct 18, 2010 at 10:31 -0700, Sunjeet Singh wrote:

> I believe that GPUs can be used here to perform parallel signature 
> matching by different protocol analyzers, thus speeding up the protocol 
> analysis phase. 

That's generally right and, as the Snort work demonstrates,
parallelizing signature matching across GPUs can indeed improve 
performance quite a bit. For Bro, however, improving signature
performance is actually not that crucial as its main performance
bottlenecks are elsewhere (the single most important bottleneck
today is the script interpreter). 

Thus, while generally improving the performance of Bro's signature
engine would certainly still be nice (and I appreciate your interest
in helping with this!), I'm not sure it's actually worth spending
the time that a solid GPU-based implementation would require.

I'd be happy to provide you with some further thoughts on directions
you could work on for improving Bro's performance. Write me a mail
off-list if you're interested. 

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


From mcflyingdp at gmail.com  Fri Oct 22 10:11:00 2010
From: mcflyingdp at gmail.com (=?GB2312?B?RWx4INDH?=)
Date: Sat, 23 Oct 2010 01:11:00 +0800
Subject: [Bro] Who has put BRO turn parallel system?
Message-ID: <4CC1C5A4.8050108@gmail.com>

 Who has put BRO turn parallel system(concurrent system)?

I'm a second-year university student from China.I'm participating in a
project development about NIDS.The teacher suggested to find if someone
put BRO turn parallel system(concurrent system).

Thanks ....I'm very interesting about Nids and BRO.But It's hard to find
any document about BRO(Almost no),so I must learn from mail list...

If you have spare time. Trouble answering my this problem.


From mcholste at gmail.com  Fri Oct 22 10:46:38 2010
From: mcholste at gmail.com (Martin Holste)
Date: Fri, 22 Oct 2010 12:46:38 -0500
Subject: [Bro] Use of GPUs for signature matching?
In-Reply-To: <20101020212045.GD68831@icir.org>
References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org>
Message-ID: <AANLkTik4K130GkeRnJ67-UTHPGOy-5RFeoqRyfiR5w6P@mail.gmail.com>

>For Bro, however, improving signature
> performance is actually not that crucial as its main performance
> bottlenecks are elsewhere (the single most important bottleneck
> today is the script interpreter).

Robin, can you elaborate on this a bit?  I'm very surprised that
pattern matching would not be the first bottleneck.

With that, I've watched the debate fly back and forth between Marty
Rausch (in Snort) and Victor Julien (in Suricata) on the pros and cons
of multithreading and I'd like to hear your take.  Marty's point was
that multithreading leads to CPU cache inefficiency which incurs a
penalty greater than the boost to the pattern matching in parallel and
therefore suggests flow-pinned load-balancing for scaling.  Do you
have an opinion on the matter?

Thanks,

Martin


From sstattla at gmail.com  Fri Oct 22 11:56:46 2010
From: sstattla at gmail.com (Sunjeet Singh)
Date: Fri, 22 Oct 2010 11:56:46 -0700
Subject: [Bro] Multi-threading
In-Reply-To: <20101020211211.GC68831@icir.org>
References: <4CAFABAA.30105@gmail.com> <20101020211211.GC68831@icir.org>
Message-ID: <4CC1DE6E.6030202@gmail.com>

Thanks for sharing the link to the paper, it made an interesting read. 
This paper does a great job of explaining the concepts involved, even 
for someone like myself who doesn't have a background in parallel computing.

Clearly, an IDS architecture that separates protocol analysis and event 
handling can employ this technique to improve performance. And so this 
can be used for Bro. But, you'd need a working ANI.
I don't know how recently this paper was written, but when we're talking 
about today, where does ANI fit in, in hardware, and if not implemented 
as custom hardware then as a small program running in a core " if a 
multicore fabric includes embedded network resources" (like UltraSPARC T2)?

I couldn't figure out how recently this paper was written (2007-08?), 
and so while reading this paper I couldn't help but think about this 
very basic question-
Today, if I'm using Bro as the Host-based IDS on my machine, and if I 
find that Bro is not being able to keep up with the incoming packet 
rate, what are some steps that I should take?


Thank you,
Sunjeet Singh


On 10-10-20 2:12 PM, Robin Sommer wrote:
> Sorry for the delay.
>
> On Fri, Oct 08, 2010 at 16:39 -0700, Sunjeet Singh wrote:
>
>> Can someone please comment on the current status of multi-threading in
>> Bro? I would be interested in doing some work here.
> We have a proof-of-concept implementation of a multi-threaded Bro.
> Even though still an early prototype, it already improves Bro's
> performance quite a bit on multi-core systems and demonstrates that
> the approach works quite well. However, this prototype still has a
> number of limitations and is not yet usable from an operational
> perspective. There are also a number of different routes we could go
> from here, which aren't fully clear yet in their specifics.
>
> For more background, the most current description of the prototype
> is here:
>
>       http://www.icir.org/robin/papers/cc-multi-core-icast.pdf
>
> Section V. describes the parallelization approach, and Section VI.
> presents some preliminary measurements. (Section I-IV are on a more
> conceptual level; not all of that is directly reflected in Bro).
>
> A limiting factor for moving this forward right now is available
> time, so help and contributions would certainly be welcome. Is there
> anything specific you're thinking about? (I saw your mail about
> GPUs, will reply to that in a bit).
>
> Robin
>


From seth at icir.org  Fri Oct 22 12:39:50 2010
From: seth at icir.org (Seth Hall)
Date: Fri, 22 Oct 2010 15:39:50 -0400
Subject: [Bro] Multi-threading
In-Reply-To: <4CC1DE6E.6030202@gmail.com>
References: <4CAFABAA.30105@gmail.com> <20101020211211.GC68831@icir.org>
	<4CC1DE6E.6030202@gmail.com>
Message-ID: <23648188-F68E-408F-9C4E-73BA60CAD492@icir.org>


On Oct 22, 2010, at 2:56 PM, Sunjeet Singh wrote:

> Today, if I'm using Bro as the Host-based IDS on my machine, and if I 
> find that Bro is not being able to keep up with the incoming packet 
> rate, what are some steps that I should take?

I'm guessing you meant network based IDS (as opposed to Host-based)?

Currently, if you are trying to scale Bro as a network IDS the most viable method is to use the cluster deployment using the BroControl utility.  It's currently being used in production at a number of locations.  For more documentation about BroControl and the cluster deployment you can refer to the following link.

http://www.icir.org/robin/bro-cluster/README.html

  .Seth


From gmhoward at gmail.com  Mon Oct 25 20:15:11 2010
From: gmhoward at gmail.com (Gaspar Modelo-Howard)
Date: Mon, 25 Oct 2010 23:15:11 -0400
Subject: [Bro] Remote reconfiguration of a Bro sensor
In-Reply-To: <20101018193306.GA71746@icir.org>
References: <20101018193306.GA71746@icir.org>
Message-ID: <1288062911.11213.6.camel@kareem>

Hello,

Can someone please point to some info on how does Bro currently support
to remotely reconfigure a sensor? Any example would also be appreciated.
I want to configure Bro to allow remote reconfiguration of sensors
without shutting down the sensor. One particular case I am interested in
is telling a Bro sensor to include/exclude a .bro script while running.
For example, a sensor starts with 'bro http' and then later is
reconfigured to 'bro http ssh'.

I briefly talked to Robin and Seth on regards to this so sorry to bring
it up again. But seems like I missed some important pointers, can't find
where/how to proceed with this. Have been successful sharing state
between remote sensors, like bro-to-bro comm from 2009 workshop, but not
doing remote reconfiguration.

Many thanks,


Gaspar 


From robin at icir.org  Mon Oct 25 21:47:44 2010
From: robin at icir.org (Robin Sommer)
Date: Mon, 25 Oct 2010 21:47:44 -0700
Subject: [Bro] Use of GPUs for signature matching?
In-Reply-To: <AANLkTik4K130GkeRnJ67-UTHPGOy-5RFeoqRyfiR5w6P@mail.gmail.com>
References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org>
	<AANLkTik4K130GkeRnJ67-UTHPGOy-5RFeoqRyfiR5w6P@mail.gmail.com>
Message-ID: <20101026044744.GB37556@icir.org>


On Fri, Oct 22, 2010 at 12:46 -0500, you wrote:

> Robin, can you elaborate on this a bit?  I'm very surprised that
> pattern matching would not be the first bottleneck.

The answer is quiet simple actually: Bro just doesn't do that much
pattern matching. While it has a pattern engine similar to what
Snort/Suricata are relying on, a typical Bro setup doesn't use it
very much at all: typically there are just a few signatures
configured, often just for doing dynamic protocol detection. 

Bro is doing a lot of other things instead, in particular deep
stateful protocol analysis and execution of its analysis scripts. In
particular the latter is getting more and more expensive compared to
Bro's other components: scripts are becoming larger and more
complex, they track more state, and they have to deal with more
traffic to analyze. The script interpreter is a piece we haven't
spend much time on optimizing yet (it's indeed still an
*interpreter* ...), and it actually accounts for a large share of
Bro's CPU (and also memory) footprint these days. 

Note that executing scripts written in Bro's language is much
different from doing pattern matching; improving regexp performance
is not going to help much at all with the scripts. That's quite
different from Snort/Suricata obviously, which don't do much else
than pattern mastching.

> Marty's point was that multithreading leads to CPU cache
> inefficiency which incurs a penalty greater than the boost to the
> pattern matching in parallel and therefore suggests flow-pinned
> load-balancing for scaling.  Do you have an opinion on the matter?

It's hard to answer that in a few sentences, but generally I agree
that a flow-based load-balancing scheme is a reasonable approach for
the lowest layer of the system. Many NIDS (includig Snort and Bro)
do much of their work on a per-flow basis, so parallelzing at that
granularity certainly makes a lot of sense and avoids communication
overhead (and hence also cache issues). Generally, such a flow-based
scheme can then be implemented either at the system/process level
(i.e., running more than one instance of the NIDS, with a suitable
frontend load-balancer splitting up the work, either externally or
internally); or at the thread-level (multiple threads fed by a
master thread). Conceptually, that doesn't make a lot of a
difference, and the former is what we're doing with the Bro Cluster.

Now, Snort has the "advantage" that such a simple flow-based scheme
is pretty much all it needs to do for parallelizing. Because there's
not much happening after the pattern matching step, there's also no
need for further coordination between the instances/threads. For
Bro, however, this is where things actually start to get
interesting: since much of its CPU cycles are spent for the scripts,
Amdahl's Law tells us that we need to parallelize the interpreter if
we want to scale.  Unfortunately, parallelizing the execution of a
free-form Turing-complete language isn't exactly trivial ... 

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


From robin at icir.org  Mon Oct 25 21:52:36 2010
From: robin at icir.org (Robin Sommer)
Date: Mon, 25 Oct 2010 21:52:36 -0700
Subject: [Bro] Multi-threading
In-Reply-To: <4CC1DE6E.6030202@gmail.com>
References: <4CAFABAA.30105@gmail.com> <20101020211211.GC68831@icir.org>
	<4CC1DE6E.6030202@gmail.com>
Message-ID: <20101026045236.GC37556@icir.org>


On Fri, Oct 22, 2010 at 11:56 -0700, you wrote:

> Clearly, an IDS architecture that separates protocol analysis and event  
> handling can employ this technique to improve performance. And so this  
> can be used for Bro. But, you'd need a working ANI.

That's right, but note that the ANI in the paper is a more powerful
component than what we need for "just" parallelizing a passive NIDS
(such as Bro). The latter primarily needs a load-balancer that
distributes packets across threads in a predictable manner. In the
most simple implemention (and in the current prototype) that's just
another thread copying packets around, which is obviously not that
great. A number of things come to mind to improve on that (as you
already mention as well): an external load-balancer like what we use
for the Bro Cluster; some decicated network processers can already
do this internally; and, probably the best option of all, some of
the new commodity NICs actually have the necessary functionality on
board and can steer traffic directly to their target threads.
Generally, I expect much of what we need here to become pretty much
standard functionality in the near future. 

> I don't know how recently this paper was written,

The paper has been growing over a while. :) The later parts were
finished about a year ago, the earlier ones in 2007/8 alreday iirc. 

Robin

-- 
Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org 
ICSI/LBNL    * Fax   +1 (510) 666-2956 *   www.icir.org


From mcholste at gmail.com  Tue Oct 26 06:54:18 2010
From: mcholste at gmail.com (Martin Holste)
Date: Tue, 26 Oct 2010 08:54:18 -0500
Subject: [Bro] Use of GPUs for signature matching?
In-Reply-To: <20101026044744.GB37556@icir.org>
References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org>
	<AANLkTik4K130GkeRnJ67-UTHPGOy-5RFeoqRyfiR5w6P@mail.gmail.com>
	<20101026044744.GB37556@icir.org>
Message-ID: <AANLkTinQhTFJRifhdO-aP6+LTgwpzPCBLTbUWE9XUwoY@mail.gmail.com>

Ok, this makes a lot of sense now.  So you're saying that for the few
true pattern matching activities Bro has to do, there's plenty of CPU
to spare, but for script execution such as going to time-machine,
extracting files from pcap, etc., you're running out of CPU.

So if you're running into a performance challenge with the scripting
language, would you consider switching from the native Bro scripting
language to an embedded interpreter from something like Perl, Python,
or Lua?  That in and of itself probably would hurt performance, but my
guess is that it would take a lot less time to embed something and
then multi-thread it then rolling your own from scratch.  With the
increase in number of CPU cores climbing exponentially, a small
performance hit would probably be acceptable if it can be offset by
running on multiple cores.  I think a well-known script language would
also be a lot less scary for newcomers to Bro and really increase its
user base.

On Mon, Oct 25, 2010 at 11:47 PM, Robin Sommer <robin at icir.org> wrote:
>
> On Fri, Oct 22, 2010 at 12:46 -0500, you wrote:
>
>> Robin, can you elaborate on this a bit? ?I'm very surprised that
>> pattern matching would not be the first bottleneck.
>
> The answer is quiet simple actually: Bro just doesn't do that much
> pattern matching. While it has a pattern engine similar to what
> Snort/Suricata are relying on, a typical Bro setup doesn't use it
> very much at all: typically there are just a few signatures
> configured, often just for doing dynamic protocol detection.
>
> Bro is doing a lot of other things instead, in particular deep
> stateful protocol analysis and execution of its analysis scripts. In
> particular the latter is getting more and more expensive compared to
> Bro's other components: scripts are becoming larger and more
> complex, they track more state, and they have to deal with more
> traffic to analyze. The script interpreter is a piece we haven't
> spend much time on optimizing yet (it's indeed still an
> *interpreter* ...), and it actually accounts for a large share of
> Bro's CPU (and also memory) footprint these days.
>
> Note that executing scripts written in Bro's language is much
> different from doing pattern matching; improving regexp performance
> is not going to help much at all with the scripts. That's quite
> different from Snort/Suricata obviously, which don't do much else
> than pattern mastching.
>
>> Marty's point was that multithreading leads to CPU cache
>> inefficiency which incurs a penalty greater than the boost to the
>> pattern matching in parallel and therefore suggests flow-pinned
>> load-balancing for scaling. ?Do you have an opinion on the matter?
>
> It's hard to answer that in a few sentences, but generally I agree
> that a flow-based load-balancing scheme is a reasonable approach for
> the lowest layer of the system. Many NIDS (includig Snort and Bro)
> do much of their work on a per-flow basis, so parallelzing at that
> granularity certainly makes a lot of sense and avoids communication
> overhead (and hence also cache issues). Generally, such a flow-based
> scheme can then be implemented either at the system/process level
> (i.e., running more than one instance of the NIDS, with a suitable
> frontend load-balancer splitting up the work, either externally or
> internally); or at the thread-level (multiple threads fed by a
> master thread). Conceptually, that doesn't make a lot of a
> difference, and the former is what we're doing with the Bro Cluster.
>
> Now, Snort has the "advantage" that such a simple flow-based scheme
> is pretty much all it needs to do for parallelizing. Because there's
> not much happening after the pattern matching step, there's also no
> need for further coordination between the instances/threads. For
> Bro, however, this is where things actually start to get
> interesting: since much of its CPU cycles are spent for the scripts,
> Amdahl's Law tells us that we need to parallelize the interpreter if
> we want to scale. ?Unfortunately, parallelizing the execution of a
> free-form Turing-complete language isn't exactly trivial ...
>
> Robin
>
> --
> Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org
> ICSI/LBNL ? ?* Fax ? +1 (510) 666-2956 * ? www.icir.org
>


From seth at icir.org  Tue Oct 26 07:33:57 2010
From: seth at icir.org (Seth Hall)
Date: Tue, 26 Oct 2010 10:33:57 -0400
Subject: [Bro] Use of GPUs for signature matching?
In-Reply-To: <AANLkTinQhTFJRifhdO-aP6+LTgwpzPCBLTbUWE9XUwoY@mail.gmail.com>
References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org>
	<AANLkTik4K130GkeRnJ67-UTHPGOy-5RFeoqRyfiR5w6P@mail.gmail.com>
	<20101026044744.GB37556@icir.org>
	<AANLkTinQhTFJRifhdO-aP6+LTgwpzPCBLTbUWE9XUwoY@mail.gmail.com>
Message-ID: <DFD67223-F625-4442-99CC-B7C40AE97AE9@icir.org>

Hi Martin,

On Oct 26, 2010, at 9:54 AM, Martin Holste wrote:

> So if you're running into a performance challenge with the scripting
> language, would you consider switching from the native Bro scripting
> language to an embedded interpreter from something like Perl, Python,
> or Lua?  That in and of itself probably would hurt performance, but my
> guess is that it would take a lot less time to embed something and
> then multi-thread it then rolling your own from scratch.

That likely not true.  The performance hit would probably quite large with many of the dynamic languages.  I don't know about Lua but with Perl and Python being untyped they do a lot of acrobatics whenever variables are created, accessed, and modified which doesn't work very with the soft realtime constraints that Bro needs to function within.

>  I think a well-known script language would
> also be a lot less scary for newcomers to Bro and really increase its
> user base.

I think that every who start working with Bro has a point where they get frustrated with having to learn a new language (I know I did), but then after some time they start to recognize the reason that Bro has it's own language.  The Bro policy script language is a large part of what makes Bro, Bro. :)  It's a domain specific language for doing event analysis and Bro's core has been made to turn network traffic into a stream of events so that it would be possible to analyze it in this style.  General purpose scripting languages would likely have to use strange syntaxes to get some of the features and functionality of the Bro language.

What will likely increase Bro's user base in a big way is for Bro to do a lot of interesting detections out of the box.  There's likely going to ever only be a fairly small proportion of users who would ever learn or heavily use the scripting language even if it were Python or Perl.  More documentation is going to help too. :)

  .Seth


From vern at icir.org  Tue Oct 26 08:19:42 2010
From: vern at icir.org (Vern Paxson)
Date: Tue, 26 Oct 2010 08:19:42 -0700
Subject: [Bro] Use of GPUs for signature matching?
In-Reply-To: <AANLkTinQhTFJRifhdO-aP6+LTgwpzPCBLTbUWE9XUwoY@mail.gmail.com>
	(Tue, 26 Oct 2010 08:54:18 CDT).
Message-ID: <20101026151942.8F2C73137F9@taffy.ICSI.Berkeley.EDU>

> for the few
> true pattern matching activities Bro has to do, there's plenty of CPU
> to spare

Right.

> but for script execution such as going to time-machine,
> extracting files from pcap, etc., you're running out of CPU.

Yes in general for script execution, though that usually doesn't involve
the Time Mchine or pcap files.

> So if you're running into a performance challenge with the scripting
> language, would you consider switching from the native Bro scripting
> language to an embedded interpreter from something like Perl, Python,
> or Lua?

No, because we view Bro's domain-specific language as a big plus.

> With the
> increase in number of CPU cores climbing exponentially, a small
> performance hit would probably be acceptable if it can be offset by
> running on multiple cores.

Note, we have a major project on multicore network security analysis, which
focuses on Bro.  So this is definitely on our radar.  Here, having a
domain-specific language can be a significant win, since we can leverage
particular semantics for optimization that we could't if we used a general
interpreter.

> I think a well-known script language would
> also be a lot less scary for newcomers to Bro and really increase its
> user base.

I wonder if it's the particulars of the language.  Bro's scripting language
isn't itself that peculiar or hard to pick up.  What gets harder is (1)
the large set of predefined events, (2) langauge quirks in support of things
like state management (but we'd need those anyway), (3) the lack of adequate
"here's the overall model" and "here's the paradigm for XYZ" documentation -
which we're definitely aiming to fix.

		Vern


From mcholste at gmail.com  Tue Oct 26 11:01:24 2010
From: mcholste at gmail.com (Martin Holste)
Date: Tue, 26 Oct 2010 13:01:24 -0500
Subject: [Bro] Time Machine RAM usage question
Message-ID: <AANLkTimyp5kSqn-rfzhM4NQSyN22Wus5PV0BYJcJAW7p@mail.gmail.com>

I've got a question on tm's RAM usage, and I was hoping someone could
point me in the right direction:  I'm trying to get as much duration
as possible out of tm so that I can go back many hours or even days
for packets.  I have a lot of disk to throw at it, and a fair amount
of RAM.  The problem I'm running into is that when I move the
conn_timeout up to 86400 but keep the mem settings low, tm still
consumes a massive amount of RAM.  I am concluding that the RAM usage
must be the connection tables, and not the mem setting for the traffic
class.  Is there a way to allow tm to maximize for longevity?  My
understanding is that if I move the conn_timeout down, those packets
will not be available for query.

Thanks,

Martin


From JAzoff at uamail.albany.edu  Tue Oct 26 11:14:05 2010
From: JAzoff at uamail.albany.edu (Justin Azoff)
Date: Tue, 26 Oct 2010 14:14:05 -0400
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <AANLkTimyp5kSqn-rfzhM4NQSyN22Wus5PV0BYJcJAW7p@mail.gmail.com>
References: <AANLkTimyp5kSqn-rfzhM4NQSyN22Wus5PV0BYJcJAW7p@mail.gmail.com>
Message-ID: <20101026181405.GG5900@datacomm.albany.edu>

On Tue, Oct 26, 2010 at 02:01:24PM -0400, Martin Holste wrote:
> I've got a question on tm's RAM usage, and I was hoping someone could
> point me in the right direction:  I'm trying to get as much duration
> as possible out of tm so that I can go back many hours or even days
> for packets.  I have a lot of disk to throw at it, and a fair amount
> of RAM.  The problem I'm running into is that when I move the
> conn_timeout up to 86400 but keep the mem settings low, tm still
> consumes a massive amount of RAM.

I don't think you need conn_timeout set that high.  I use:

    conn_timeout 180;

and then

    cutoff 5k;
    disk 4g;
    filesize 128m;
    mem 512m;

works fine for the most part.

-- 
-- Justin Azoff
-- Network Security & Performance Analyst


From vern at icir.org  Tue Oct 26 12:43:42 2010
From: vern at icir.org (Vern Paxson)
Date: Tue, 26 Oct 2010 12:43:42 -0700
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <20101026181405.GG5900@datacomm.albany.edu> (Tue,
	26 Oct 2010 14:14:05 EDT).
Message-ID: <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>

> I don't think you need conn_timeout set that high.

Right.  conn_timeout is how long to keep internal state when a connection
is inactive; *not* how long to keep recorded connections lying around.

		Vern


From mcholste at gmail.com  Tue Oct 26 14:04:46 2010
From: mcholste at gmail.com (Martin Holste)
Date: Tue, 26 Oct 2010 16:04:46 -0500
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>
References: <20101026181405.GG5900@datacomm.albany.edu>
	<20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>
Message-ID: <AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>

That's what I originally thought.  What was throwing me was when I
would try to find packets any older than the cutoff, the queries would
come up empty, the log showing something like "query not found in
connection table."  So I ran "show conn sample" to see the connections
table, and the oldest connections were always at the cutoff.  When I
looked through the source code, it appeared that connections older
than the cutoff were evicted from the connections table, but the query
depended on the connections table to find the packets on disk/ram.

On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson <vern at icir.org> wrote:
>> I don't think you need conn_timeout set that high.
>
> Right. ?conn_timeout is how long to keep internal state when a connection
> is inactive; *not* how long to keep recorded connections lying around.
>
> ? ? ? ? ? ? ? ?Vern
>


From gregor at icir.org  Tue Oct 26 17:13:32 2010
From: gregor at icir.org (Gregor Maier)
Date: Tue, 26 Oct 2010 17:13:32 -0700
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>
References: <20101026181405.GG5900@datacomm.albany.edu>	<20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>
	<AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>
Message-ID: <4CC76EAC.7070404@icir.org>

Hi,

That sound's weird. I'm going to look into that.
Which kind of query did you use?
Can you maybe copy-paste a sample query plus the error message into an
e-mail?

cu
Gregor

On 10/26/10 14:04 , Martin Holste wrote:
> That's what I originally thought.  What was throwing me was when I
> would try to find packets any older than the cutoff, the queries would
> come up empty, the log showing something like "query not found in
> connection table."  So I ran "show conn sample" to see the connections
> table, and the oldest connections were always at the cutoff.  When I
> looked through the source code, it appeared that connections older
> than the cutoff were evicted from the connections table, but the query
> depended on the connections table to find the packets on disk/ram.
> 
> On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson <vern at icir.org> wrote:
>>> I don't think you need conn_timeout set that high.
>>
>> Right.  conn_timeout is how long to keep internal state when a connection
>> is inactive; *not* how long to keep recorded connections lying around.
>>
>>                Vern
>>
> 
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro


-- 
Gregor Maier                                             gregor at icir.org
Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
1947 Center St., Ste. 600                    http://www.icir.org/gregor/
Berkeley, CA 94704
USA


From gregor at icir.org  Tue Oct 26 17:45:35 2010
From: gregor at icir.org (Gregor Maier)
Date: Tue, 26 Oct 2010 17:45:35 -0700
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>
References: <20101026181405.GG5900@datacomm.albany.edu>	<20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>
	<AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>
Message-ID: <4CC7762F.7030809@icir.org>

Hi,

looking at the source code, it seems that the message 'not found in
connection table' is related to subscriptions (i.e., the query request
that all future packets for this connections should be included in the
query results. And subscriptions only work for connections that are
currently active). So this message is ok. (Actually it is commented out
in the current svn-snapshot. Did you uncomment it or was it in your
version of the TM source code).

As others already pointed out the conn_timeout is indeed the idle time
until a connection is expired from the connection table (we only use the
timeout to expire connections). Setting this to high value is
counter-productive: the memory consumption increases significantly.
Furthermore, long timeouts will reduce visibility. No new packets will
be recorded for connections (actually 5-tuples) that aren't expired but
have exceeded the cutoff. So long timeouts can be problematic in the
case of 5-tuple reuse.

To check your current retention times, you can check the classes.tm.log
file. mem_dt and disk_dt will tell you how many seconds of packet data
are currently retained in memory and on disk. Can you check whether the
packets you want to retrieve fall into this time-frame?


cu
Gregor


On 10/26/10 14:04 , Martin Holste wrote:
> That's what I originally thought.  What was throwing me was when I
> would try to find packets any older than the cutoff, the queries would
> come up empty, the log showing something like "query not found in
> connection table."  So I ran "show conn sample" to see the connections
> table, and the oldest connections were always at the cutoff.  When I
> looked through the source code, it appeared that connections older
> than the cutoff were evicted from the connections table, but the query
> depended on the connections table to find the packets on disk/ram.
> 
> On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson <vern at icir.org> wrote:
>>> I don't think you need conn_timeout set that high.
>>
>> Right.  conn_timeout is how long to keep internal state when a connection
>> is inactive; *not* how long to keep recorded connections lying around.
>>
>>                Vern
>>
> 
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro


-- 
Gregor Maier                                             gregor at icir.org
Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
1947 Center St., Ste. 600                    http://www.icir.org/gregor/
Berkeley, CA 94704
USA


From mcholste at gmail.com  Tue Oct 26 18:46:22 2010
From: mcholste at gmail.com (Martin Holste)
Date: Tue, 26 Oct 2010 20:46:22 -0500
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <4CC7762F.7030809@icir.org>
References: <20101026181405.GG5900@datacomm.albany.edu>
	<20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>
	<AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>
	<4CC7762F.7030809@icir.org>
Message-ID: <AANLkTin+TbxK0WEjxGnxBpO7YO_DjOXwmNVQMn+74sv+@mail.gmail.com>

Thanks for looking into it.  I blew away all of my buffers and indexes
and restarted with more sane settings and queries seem to be behaving.
 I'm certainly not ruling out user error.  In any case, thanks for
your help, it's very appreciated.

On Tue, Oct 26, 2010 at 7:45 PM, Gregor Maier <gregor at icir.org> wrote:
> Hi,
>
> looking at the source code, it seems that the message 'not found in
> connection table' is related to subscriptions (i.e., the query request
> that all future packets for this connections should be included in the
> query results. And subscriptions only work for connections that are
> currently active). So this message is ok. (Actually it is commented out
> in the current svn-snapshot. Did you uncomment it or was it in your
> version of the TM source code).
>
> As others already pointed out the conn_timeout is indeed the idle time
> until a connection is expired from the connection table (we only use the
> timeout to expire connections). Setting this to high value is
> counter-productive: the memory consumption increases significantly.
> Furthermore, long timeouts will reduce visibility. No new packets will
> be recorded for connections (actually 5-tuples) that aren't expired but
> have exceeded the cutoff. So long timeouts can be problematic in the
> case of 5-tuple reuse.
>
> To check your current retention times, you can check the classes.tm.log
> file. mem_dt and disk_dt will tell you how many seconds of packet data
> are currently retained in memory and on disk. Can you check whether the
> packets you want to retrieve fall into this time-frame?
>
>
> cu
> Gregor
>
>
>
>
>
> On 10/26/10 14:04 , Martin Holste wrote:
>> That's what I originally thought. ?What was throwing me was when I
>> would try to find packets any older than the cutoff, the queries would
>> come up empty, the log showing something like "query not found in
>> connection table." ?So I ran "show conn sample" to see the connections
>> table, and the oldest connections were always at the cutoff. ?When I
>> looked through the source code, it appeared that connections older
>> than the cutoff were evicted from the connections table, but the query
>> depended on the connections table to find the packets on disk/ram.
>>
>> On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson <vern at icir.org> wrote:
>>>> I don't think you need conn_timeout set that high.
>>>
>>> Right. ?conn_timeout is how long to keep internal state when a connection
>>> is inactive; *not* how long to keep recorded connections lying around.
>>>
>>> ? ? ? ? ? ? ? ?Vern
>>>
>>
>> _______________________________________________
>> Bro mailing list
>> bro at bro-ids.org
>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>
>
> --
> Gregor Maier ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? gregor at icir.org
> Int. Computer Science Institute (ICSI) ? ? ? ? ?gregor at icsi.berkeley.edu
> 1947 Center St., Ste. 600 ? ? ? ? ? ? ? ? ? ?http://www.icir.org/gregor/
> Berkeley, CA 94704
> USA
>


From gregor at icir.org  Tue Oct 26 19:52:51 2010
From: gregor at icir.org (Gregor Maier)
Date: Tue, 26 Oct 2010 19:52:51 -0700
Subject: [Bro] Time Machine RAM usage question
In-Reply-To: <AANLkTin+TbxK0WEjxGnxBpO7YO_DjOXwmNVQMn+74sv+@mail.gmail.com>
References: <20101026181405.GG5900@datacomm.albany.edu>
	<20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU>
	<AANLkTik7Skp3+=pUqFd4OSLUKnP3Z_iZnC7Bt=mGfUr8@mail.gmail.com>
	<4CC7762F.7030809@icir.org>
	<AANLkTin+TbxK0WEjxGnxBpO7YO_DjOXwmNVQMn+74sv+@mail.gmail.com>
Message-ID: <4CC79403.5090108@icir.org>

no worries.
Just let me know if the error pops up again.

cu
gregor

On 10/26/10 18:46 , Martin Holste wrote:
> Thanks for looking into it.  I blew away all of my buffers and indexes
> and restarted with more sane settings and queries seem to be behaving.
>  I'm certainly not ruling out user error.  In any case, thanks for
> your help, it's very appreciated.
> 
> On Tue, Oct 26, 2010 at 7:45 PM, Gregor Maier <gregor at icir.org> wrote:
>> Hi,
>>
>> looking at the source code, it seems that the message 'not found in
>> connection table' is related to subscriptions (i.e., the query request
>> that all future packets for this connections should be included in the
>> query results. And subscriptions only work for connections that are
>> currently active). So this message is ok. (Actually it is commented out
>> in the current svn-snapshot. Did you uncomment it or was it in your
>> version of the TM source code).
>>
>> As others already pointed out the conn_timeout is indeed the idle time
>> until a connection is expired from the connection table (we only use the
>> timeout to expire connections). Setting this to high value is
>> counter-productive: the memory consumption increases significantly.
>> Furthermore, long timeouts will reduce visibility. No new packets will
>> be recorded for connections (actually 5-tuples) that aren't expired but
>> have exceeded the cutoff. So long timeouts can be problematic in the
>> case of 5-tuple reuse.
>>
>> To check your current retention times, you can check the classes.tm.log
>> file. mem_dt and disk_dt will tell you how many seconds of packet data
>> are currently retained in memory and on disk. Can you check whether the
>> packets you want to retrieve fall into this time-frame?
>>
>>
>> cu
>> Gregor
>>
>>
>>
>>
>>
>> On 10/26/10 14:04 , Martin Holste wrote:
>>> That's what I originally thought.  What was throwing me was when I
>>> would try to find packets any older than the cutoff, the queries would
>>> come up empty, the log showing something like "query not found in
>>> connection table."  So I ran "show conn sample" to see the connections
>>> table, and the oldest connections were always at the cutoff.  When I
>>> looked through the source code, it appeared that connections older
>>> than the cutoff were evicted from the connections table, but the query
>>> depended on the connections table to find the packets on disk/ram.
>>>
>>> On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson <vern at icir.org> wrote:
>>>>> I don't think you need conn_timeout set that high.
>>>>
>>>> Right.  conn_timeout is how long to keep internal state when a connection
>>>> is inactive; *not* how long to keep recorded connections lying around.
>>>>
>>>>                Vern
>>>>
>>>
>>> _______________________________________________
>>> Bro mailing list
>>> bro at bro-ids.org
>>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>>
>>
>> --
>> Gregor Maier                                             gregor at icir.org
>> Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
>> 1947 Center St., Ste. 600                    http://www.icir.org/gregor/
>> Berkeley, CA 94704
>> USA
>>
> 


-- 
Gregor Maier                                             gregor at icir.org
Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
1947 Center St., Ste. 600                    http://www.icir.org/gregor/
Berkeley, CA 94704
USA


From seth at icir.org  Thu Oct 28 08:56:29 2010
From: seth at icir.org (Seth Hall)
Date: Thu, 28 Oct 2010 11:56:29 -0400
Subject: [Bro] Bro scripts
Message-ID: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>

Hi all,

I'm doing work on Bro's policy scripts for the next release and I want to find policy scripts floating around that can be shared and any helpful code snippets.  Anything you can contribute would be greatly appreciated, thanks!

  .Seth


From mcholste at gmail.com  Thu Oct 28 14:13:00 2010
From: mcholste at gmail.com (Martin Holste)
Date: Thu, 28 Oct 2010 16:13:00 -0500
Subject: [Bro] time machine filesize issue
Message-ID: <AANLkTikQ75D-s-qTe4C2S66cTn17LXN3qHdhvgx08Hd_@mail.gmail.com>

I wanted to make my disk-bound queries faster, so I wanted the fewest
files to search through for tm because it appears that every separate
file makes the interval searches in pcapnav slower if you're
requesting many packets.  I found than when setting filesize > 289g,
tm creates a file per connection and trashes its working directory.
So two questions: am I right in thinking it is faster to search
through as few files as possible when using pcapnav?  And secondly,
does anyone know why tm breaks when trying to create files larger than
289g?

Thanks,

Martin


From gregor at icir.org  Thu Oct 28 14:42:05 2010
From: gregor at icir.org (Gregor Maier)
Date: Thu, 28 Oct 2010 14:42:05 -0700
Subject: [Bro] time machine filesize issue
In-Reply-To: <AANLkTikQ75D-s-qTe4C2S66cTn17LXN3qHdhvgx08Hd_@mail.gmail.com>
References: <AANLkTikQ75D-s-qTe4C2S66cTn17LXN3qHdhvgx08Hd_@mail.gmail.com>
Message-ID: <4CC9EE2D.8020703@icir.org>

On 10/28/10 14:13 , Martin Holste wrote:
> I wanted to make my disk-bound queries faster, so I wanted the fewest
> files to search through for tm because it appears that every separate
> file makes the interval searches in pcapnav slower if you're
> requesting many packets.  I found than when setting filesize > 289g,
> tm creates a file per connection and trashes its working directory.
> So two questions: am I right in thinking it is faster to search
> through as few files as possible when using pcapnav?  And secondly,
> does anyone know why tm breaks when trying to create files larger than
> 289g?

I'm don't think that pcapnav speed is significantly influenced by
filesize. AFAIK pcapnav jumps to a random file offset, then tries to
sequentially read until it finds something that looks like a pcap
header. Then it checks the timestamp and reads sequentially or jumps
somewhere else until it finds the request timestamp.
If you have multiple files, then this is repeated for each file.
However, the TM knows which files cover which time periods, so it will
only access the files that it knows are candidates. So I would assume
that the lookup speed should be similar. I think that the specifics of
the query-result influence speed much more (e.g., is it only a single,
narrow time interval to search, or multiple small ones, or a few large
ones that cover almost the whole dataset).
Long story short: the number of files to search should not influence the
speed much.
If the number of files is huge, then the only thing I could imagine is
weird filesystem stuff going on when there are 1000s of files in one
directory and.....

OTOH, if the filesize is too large wrt the configured diskspace, the TM
will get troubles. It will delete old files, if writing more data (or
creating a new data file, can't recall which of the two). So if the data
files are huge, this will introduce quite some variance in diskspace usage.

That said: the TM definitely should not trash its working directory.....
Do I understand you correctly that you get a myriad of files in the
working directory. Do the files contain only a single (or handful) of
packets (possible from different connections). How many packets per file?
Also, how does your filesize relate to the configured disk-space?


cu
Gregor
-- 
Gregor Maier                                             gregor at icir.org
Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
1947 Center St., Ste. 600                    http://www.icir.org/gregor/
Berkeley, CA 94704
USA


From mcholste at gmail.com  Thu Oct 28 15:40:06 2010
From: mcholste at gmail.com (Martin Holste)
Date: Thu, 28 Oct 2010 17:40:06 -0500
Subject: [Bro] time machine filesize issue
In-Reply-To: <4CC9EE2D.8020703@icir.org>
References: <AANLkTikQ75D-s-qTe4C2S66cTn17LXN3qHdhvgx08Hd_@mail.gmail.com>
	<4CC9EE2D.8020703@icir.org>
Message-ID: <AANLkTi=8sbBU-2zTJ=ti6fqOADSmvgEcK+Ke4e6T96_Q@mail.gmail.com>

My performance issues were noticed when making a query over a large
timeset with many packets involved.  Since there is no way to specify
a limit of packets returned, the query takes forever.  I was looking
to improve that performance.  I will continue to play around with this
to see if there is any improvement worth the large hit for file
rollover.

With filesize set at exactly 280g (279g does not produce the problem)
tm will create one disk fifo file per packet in the workdir for each
evicted packet with a disk setting of 1000g.  I am only using one
default class for "all."

On Thu, Oct 28, 2010 at 4:42 PM, Gregor Maier <gregor at icir.org> wrote:
> On 10/28/10 14:13 , Martin Holste wrote:
>> I wanted to make my disk-bound queries faster, so I wanted the fewest
>> files to search through for tm because it appears that every separate
>> file makes the interval searches in pcapnav slower if you're
>> requesting many packets. ?I found than when setting filesize > 289g,
>> tm creates a file per connection and trashes its working directory.
>> So two questions: am I right in thinking it is faster to search
>> through as few files as possible when using pcapnav? ?And secondly,
>> does anyone know why tm breaks when trying to create files larger than
>> 289g?
>
> I'm don't think that pcapnav speed is significantly influenced by
> filesize. AFAIK pcapnav jumps to a random file offset, then tries to
> sequentially read until it finds something that looks like a pcap
> header. Then it checks the timestamp and reads sequentially or jumps
> somewhere else until it finds the request timestamp.
> If you have multiple files, then this is repeated for each file.
> However, the TM knows which files cover which time periods, so it will
> only access the files that it knows are candidates. So I would assume
> that the lookup speed should be similar. I think that the specifics of
> the query-result influence speed much more (e.g., is it only a single,
> narrow time interval to search, or multiple small ones, or a few large
> ones that cover almost the whole dataset).
> Long story short: the number of files to search should not influence the
> speed much.
> If the number of files is huge, then the only thing I could imagine is
> weird filesystem stuff going on when there are 1000s of files in one
> directory and.....
>
> OTOH, if the filesize is too large wrt the configured diskspace, the TM
> will get troubles. It will delete old files, if writing more data (or
> creating a new data file, can't recall which of the two). So if the data
> files are huge, this will introduce quite some variance in diskspace usage.
>
> That said: the TM definitely should not trash its working directory.....
> Do I understand you correctly that you get a myriad of files in the
> working directory. Do the files contain only a single (or handful) of
> packets (possible from different connections). How many packets per file?
> Also, how does your filesize relate to the configured disk-space?
>
>
> cu
> Gregor
> --
> Gregor Maier ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? gregor at icir.org
> Int. Computer Science Institute (ICSI) ? ? ? ? ?gregor at icsi.berkeley.edu
> 1947 Center St., Ste. 600 ? ? ? ? ? ? ? ? ? ?http://www.icir.org/gregor/
> Berkeley, CA 94704
> USA
>


From vallentin at icir.org  Thu Oct 28 17:59:36 2010
From: vallentin at icir.org (Matthias Vallentin)
Date: Thu, 28 Oct 2010 17:59:36 -0700
Subject: [Bro] Bro scripts
In-Reply-To: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
Message-ID: <20101029005936.GJ16825@icsi.berkeley.edu>

> I'm doing work on Bro's policy scripts for the next release and I want
> to find policy scripts floating around that can be shared and any
> helpful code snippets.  Anything you can contribute would be greatly
> appreciated, thanks!

The whole buzz about Firesheep caused me to hack up a sidejacking
detector. I haven't tested it because I literally wrote it 5 minutes
ago.

   Matthias

Here is the code:

    @load http-request
    @load http-reply

    module HTTP;

    export
    {
        redef enum Notice += { CookieReuse };

        # Number of cookies per client.
        const max_cookies = 1 &redef;

        # The time after when we expiring entries.
        const cookie_expiration = 1 hr &redef;
    }


    # Count the number of cookies per client.
    global cookies: table[string] of set[addr] &write_expire = cookie_expiration;

    event http_header(c: connection, is_orig: bool, name: string, value: string)
    {
        # We are only looking for session IDs in the client cookie header.
        if (! (is_orig && name == /[cC][oO][oO][kK][iI][eE]/))
            return;

        local client = c$id$orig_h;
        if (value !in cookies)
            cookies[value] = set();
        else 
            add cookies[value][client];

        if (|cookies[value]| <= max_cookies)
            return;

        local s = lookup_http_request_stream(c);
        NOTICE([$note=CookieReuse, $src=client,
                $msg=fmt("potential sidejacking by %s: cookie used by %d addresses",
                client, |cookies[value]|)]);
    }


From mcholste at gmail.com  Thu Oct 28 18:48:30 2010
From: mcholste at gmail.com (Martin Holste)
Date: Thu, 28 Oct 2010 20:48:30 -0500
Subject: [Bro] Bro scripts
In-Reply-To: <20101029005936.GJ16825@icsi.berkeley.edu>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
Message-ID: <AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>

That's pretty cool!  I do have one suggestion, though:  Instead of
tracking by IP, how about one cookie per user agent?  That will help
catch the side jacking when used under a NAT.

On Thursday, October 28, 2010, Matthias Vallentin <vallentin at icir.org> wrote:
>> I'm doing work on Bro's policy scripts for the next release and I want
>> to find policy scripts floating around that can be shared and any
>> helpful code snippets. ?Anything you can contribute would be greatly
>> appreciated, thanks!
>
> The whole buzz about Firesheep caused me to hack up a sidejacking
> detector. I haven't tested it because I literally wrote it 5 minutes
> ago.
>
>  ? Matthias
>
> Here is the code:
>
>  ? ?@load http-request
>  ? ?@load http-reply
>
>  ? ?module HTTP;
>
>  ? ?export
>  ? ?{
>  ? ? ? ?redef enum Notice += { CookieReuse };
>
>  ? ? ? ?# Number of cookies per client.
>  ? ? ? ?const max_cookies = 1 &redef;
>
>  ? ? ? ?# The time after when we expiring entries.
>  ? ? ? ?const cookie_expiration = 1 hr &redef;
>  ? ?}
>
>
>  ? ?# Count the number of cookies per client.
>  ? ?global cookies: table[string] of set[addr] &write_expire = cookie_expiration;
>
>  ? ?event http_header(c: connection, is_orig: bool, name: string, value: string)
>  ? ?{
>  ? ? ? ?# We are only looking for session IDs in the client cookie header.
>  ? ? ? ?if (! (is_orig && name == /[cC][oO][oO][kK][iI][eE]/))
>  ? ? ? ? ? ?return;
>
>  ? ? ? ?local client = c$id$orig_h;
>  ? ? ? ?if (value !in cookies)
>  ? ? ? ? ? ?cookies[value] = set();
>  ? ? ? ?else
>  ? ? ? ? ? ?add cookies[value][client];
>
>  ? ? ? ?if (|cookies[value]| <= max_cookies)
>  ? ? ? ? ? ?return;
>
>  ? ? ? ?local s = lookup_http_request_stream(c);
>  ? ? ? ?NOTICE([$note=CookieReuse, $src=client,
>  ? ? ? ? ? ? ? ?$msg=fmt("potential sidejacking by %s: cookie used by %d addresses",
>  ? ? ? ? ? ? ? ?client, |cookies[value]|)]);
>  ? ?}
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>


From seth at icir.org  Thu Oct 28 20:23:03 2010
From: seth at icir.org (Seth Hall)
Date: Thu, 28 Oct 2010 23:23:03 -0400
Subject: [Bro] Bro scripts
In-Reply-To: <AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
Message-ID: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>


On Oct 28, 2010, at 9:48 PM, Martin Holste wrote:

> That's pretty cool!  I do have one suggestion, though:  Instead of
> tracking by IP, how about one cookie per user agent?  That will help
> catch the side jacking when used under a NAT.

Good point!  Changing the tracking global from...

global cookies: table[string] of set[addr]
to...
global cookies: table[string] of set[addr, string]

and then storing the user-agent in the last string would take care of that.

I think your point about NAT gets to a more general point of what techniques could we use to detect NAT?  I know that there are a lot of little indicators of addresses that are doing NAT, but I think it could be really worthwhile to organize them all and then write a script to implement all of them so that we can get reliable NAT detection with Bro.  I can start with a few thoughts.

* Multiple web browser user-agents at a single address
    - Must match some regex for a "real" browser so that weird applications throwing junk in the user-agent don't trigger this.
    - Must be closely together in time.

Over the past several years I've had a lot of ideas for detecting NATs, but they have all completely escaped me.  Anyone else have thoughts to add to this?

  .Seth


From mcholste at gmail.com  Thu Oct 28 21:50:26 2010
From: mcholste at gmail.com (Martin Holste)
Date: Thu, 28 Oct 2010 23:50:26 -0500
Subject: [Bro] Bro scripts
In-Reply-To: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
	<966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
Message-ID: <AANLkTinfgLP5aTg=Z4PHkxDRNd=SeW6=6wny3HZwPiw8@mail.gmail.com>

I think that will definitely work for detecting NAT's if you stick to
regexing the variants on the major browsers.  As we've all seen, most
browser plugins have their own UA, so you're bound to get many UA's
out of a single computer naturally, but they should not all be for
Internet Explorer, for example.  I think scoping to IE, FF, and Webkit
engines would be good enough to be effective.

One other point, once a NAT is detected, would it be possible to
exclude that IP from future detection to save resources?  I'm a bit
concerned with memory utilization for all of these state tables.

On Thu, Oct 28, 2010 at 10:23 PM, Seth Hall <seth at icir.org> wrote:
>
> On Oct 28, 2010, at 9:48 PM, Martin Holste wrote:
>
>> That's pretty cool! ?I do have one suggestion, though: ?Instead of
>> tracking by IP, how about one cookie per user agent? ?That will help
>> catch the side jacking when used under a NAT.
>
> Good point! ?Changing the tracking global from...
>
> global cookies: table[string] of set[addr]
> to...
> global cookies: table[string] of set[addr, string]
>
> and then storing the user-agent in the last string would take care of that.
>
> I think your point about NAT gets to a more general point of what techniques could we use to detect NAT? ?I know that there are a lot of little indicators of addresses that are doing NAT, but I think it could be really worthwhile to organize them all and then write a script to implement all of them so that we can get reliable NAT detection with Bro. ?I can start with a few thoughts.
>
> * Multiple web browser user-agents at a single address
> ? ?- Must match some regex for a "real" browser so that weird applications throwing junk in the user-agent don't trigger this.
> ? ?- Must be closely together in time.
>
> Over the past several years I've had a lot of ideas for detecting NATs, but they have all completely escaped me. ?Anyone else have thoughts to add to this?
>
> ?.Seth


From vallentin at icir.org  Thu Oct 28 23:56:15 2010
From: vallentin at icir.org (Matthias Vallentin)
Date: Thu, 28 Oct 2010 23:56:15 -0700
Subject: [Bro] Bro scripts
In-Reply-To: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
	<966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
Message-ID: <20101029065615.GK16825@icsi.berkeley.edu>

> On Oct 28, 2010, at 9:48 PM, Martin Holste wrote:
> > Instead of tracking by IP, how about one cookie per user agent?
> 
> Good point!

Indeed.

> global cookies: table[string] of set[addr]
> to...
> global cookies: table[string] of set[addr, string]

That will almost do it, except that I now need to write a handler for
http_all_headers instead of http_header to obviate the need for some
global glue code. 

Furthermore, the Cookie header often bundles a bunch of cookie key-value
pairs of which only a few define the actual user session. The others can
vary and thus cause false negatives. Firesheep fortunately ships with a
bunch of handlers for major sites which I will use a baseline to
define user session for specific sites, i.e.,

    # Distills relevant cookies that define a user session.
    type user_session: record
    {
        url: pattern;       # URL
        cookies: pattern;   # Cookie keys that define the user session.
    };

    const session_info: table[string] of user_session =
    {
        ["Amazon"]       = [$url=/amazon.com/, $cookies=/x-main/],
        ["Dropbox"]      = [$url=/dropbox.com/, $cookies=/lid/],
        ["Facebook"]     = [$url=/facebook.com/, $cookies=/xs|c_user|sid/],
        ["Flickr"]       = [$url=/flickr.com/, $cookies=/cookie_session/],
        ["Google"]       = [$url=/google.com/, $cookies=/NID|SID|HSID|PREF/],
        ["NY Times"]     = [$url=/nytimes.com/, $cookies=/NYT-s|nyt-d/],
        ["Twitter"]      = [$url=/twitter.com/, $cookies=/_twitter_sess/],
        ["Yelp"]         = [$url=/yelp.com/, $cookies=/__utma/],
        ["Windows Live"] = [$url=/live.com/,
                            $cookies=/MSP(Prof|Auth)|RPSTAuth|NAP/],
        ["Wordpress"]    = [$url=/yelp.com/,
                            $cookies=/wordpress_[0-9a-fA-F]+/]
    } &redef;

What remains todo is to split the Cookie string into its key-value pairs
and then match the keys against user_session$cookies. Instead of regular
expression, I'd preferably have a set[string], but this cannot be
statically defined in a record, i.e.,

    ["Facebook"]     = [$url=/facebook.com/, $cookies={"xs", "c_user", "sid"}],
                                                      ^^^^^^^^^^^^^^^^^^^^^^^
appears not to be correct Bro syntax, because I think variable-size
types inside records cannot be initialized statically. Is that correct?
If so, I'd probably change to simple table[string] of set[string] to
represent user sessions.

In any case, the downside is that this would only detect sidejacking for
known sites. I think it would make sense to do the following. If a
profile for a user_session for a particular site (as defined above)
exists, use it, and otherwise use the entire cookie value.


> I think your point about NAT gets to a more general point of what
> techniques could we use to detect NAT?  

This is truly an important issue to tackle. I wonder if it is possible
to have better abstractions in Bro to support user-based analysis. For
example, it would be neat to augment several events with a "user"
argument which is a essentially a record filled by many other events. In
HTTP for example, some code would parse the User-Agent and fill this
record, so that the script writer could simply refer to user$os or
user$browser. 

   Matthias


From JAzoff at uamail.albany.edu  Fri Oct 29 06:12:40 2010
From: JAzoff at uamail.albany.edu (Justin Azoff)
Date: Fri, 29 Oct 2010 09:12:40 -0400
Subject: [Bro] Bro scripts
In-Reply-To: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
	<966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
Message-ID: <20101029131240.GD6560@datacomm.albany.edu>

On Thu, Oct 28, 2010 at 11:23:03PM -0400, Seth Hall wrote:
> I think your point about NAT gets to a more general point of what
> techniques could we use to detect NAT?

Using user-agents for this is tricky.  I've written some code to analyze
the output of your http-user-agents.log in splunk, and found that the
best thing to look at is the architecture and os, and ignore the
browser itself.

the script I use is here:

http://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py

it's for use in splunk, but it's 90% regexes, stuff like this:

os_mapping = (
    ('Windows .. 5.1', 'Windows XP'),
    ('Windows .. 5.2', 'Windows XP'),
    ('Windows NT 6.0', 'Windows Vista'),
    ('Windows 6.0', 'Windows Server 2008'),
    ('Windows NT 6.1', 'Windows 7'),
    ('OS X 10.5', 'MAC OS X 10.5.x'),
    ('Darwin', 'MAC OS X other'),
    ...
    ('Android', 'Android'),
    ('Linux ', 'Linux'),
    ('Windows', 'Windows - Other'),
    ('iPad', 'ipad'),
    ('iPod', 'ipod'),
    ('iPhone', 'iphone'),
)

arch_mapping = (
    ('Windows .. 5.2', 'x64'),
    ('x64', 'x64'),
    ...
    ('iPad', 'ipad'),
    ('iPod', 'ipod'),
    ('iPhone', 'iphone'),
    ('Intel', 'Intel'),
)

It is not uncommon to have one machine using multiple browsers, but rare
for it to indentify as both Vista and Windows 7, or both i386 and x64, or
Windows XP and Mac OS X 10.5.

Also, some user-agents can immediately identify NAT: iOS and android
devices do not have ethernet interfaces, so if one of these devices is
found on a non-wireless subnet it indicates the presense of a rogue access
point.

-- 
-- Justin Azoff
-- Network Security & Performance Analyst


From mcholste at gmail.com  Fri Oct 29 06:53:03 2010
From: mcholste at gmail.com (Martin Holste)
Date: Fri, 29 Oct 2010 08:53:03 -0500
Subject: [Bro] Bro scripts
In-Reply-To: <20101029131240.GD6560@datacomm.albany.edu>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
	<966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
	<20101029131240.GD6560@datacomm.albany.edu>
Message-ID: <AANLkTi=atgPgbQBqTGwcVvkmJjAEQXgc_DiUk_Gh-YkF@mail.gmail.com>

Thanks for sharing that.  Obviously in a corporate environment (or any
in which desktops are managed) most user agents will appear the same
because they are all running the same browser version.  However, I
have seen that for guest wireless and other public access points, the
amount of plugins, .NET versions, etc. makes the UA's fairly unique,
so off the bat your mileage will vary depending on the client class.
Using the detected OS would certainly be more accurate, but the
chances of an attacker having the same OS as the victim are pretty
good, so you'll obviously have to deal with a lot of false negatives.
Maybe concatenating the p0f signature with the user agent is the best
way to get a pseudo machine ID.

On Fri, Oct 29, 2010 at 8:12 AM, Justin Azoff <JAzoff at uamail.albany.edu> wrote:
> On Thu, Oct 28, 2010 at 11:23:03PM -0400, Seth Hall wrote:
>> I think your point about NAT gets to a more general point of what
>> techniques could we use to detect NAT?
>
> Using user-agents for this is tricky. ?I've written some code to analyze
> the output of your http-user-agents.log in splunk, and found that the
> best thing to look at is the architecture and os, and ignore the
> browser itself.
>
> the script I use is here:
>
> http://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py
>
> it's for use in splunk, but it's 90% regexes, stuff like this:
>
> os_mapping = (
> ? ?('Windows .. 5.1', 'Windows XP'),
> ? ?('Windows .. 5.2', 'Windows XP'),
> ? ?('Windows NT 6.0', 'Windows Vista'),
> ? ?('Windows 6.0', 'Windows Server 2008'),
> ? ?('Windows NT 6.1', 'Windows 7'),
> ? ?('OS X 10.5', 'MAC OS X 10.5.x'),
> ? ?('Darwin', 'MAC OS X other'),
> ? ?...
> ? ?('Android', 'Android'),
> ? ?('Linux ', 'Linux'),
> ? ?('Windows', 'Windows - Other'),
> ? ?('iPad', 'ipad'),
> ? ?('iPod', 'ipod'),
> ? ?('iPhone', 'iphone'),
> )
>
> arch_mapping = (
> ? ?('Windows .. 5.2', 'x64'),
> ? ?('x64', 'x64'),
> ? ?...
> ? ?('iPad', 'ipad'),
> ? ?('iPod', 'ipod'),
> ? ?('iPhone', 'iphone'),
> ? ?('Intel', 'Intel'),
> )
>
> It is not uncommon to have one machine using multiple browsers, but rare
> for it to indentify as both Vista and Windows 7, or both i386 and x64, or
> Windows XP and Mac OS X 10.5.
>
> Also, some user-agents can immediately identify NAT: iOS and android
> devices do not have ethernet interfaces, so if one of these devices is
> found on a non-wireless subnet it indicates the presense of a rogue access
> point.
>
> --
> -- Justin Azoff
> -- Network Security & Performance Analyst
>


From vallentin at icir.org  Fri Oct 29 16:35:14 2010
From: vallentin at icir.org (Matthias Vallentin)
Date: Fri, 29 Oct 2010 16:35:14 -0700
Subject: [Bro] Bro scripts
In-Reply-To: <20101029065615.GK16825@icsi.berkeley.edu>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>
	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
	<966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org>
	<20101029065615.GK16825@icsi.berkeley.edu>
Message-ID: <20101029233514.GN2349@icsi.berkeley.edu>

> I think it would make sense to do the following. If a profile for a
> user_session for a particular site (as defined above) exists, use it,
> and otherwise use the entire cookie value.

Attached is the full version of the sidejacking detector that includes
all the Firesheep handlers. I tested it for Twitter, Amazon, and Google.
The script successfully reports alarms when I hijack my own connections
with Firesheep.

   Matthias
-------------- next part --------------
# A simple sidejacking detector. 
#
# The script raises an alarm whenever more than one client makes use of the
# same cookie, where a user is defined a as (IP, user agent) pair.

@load notice
@load http-request
@load http-reply

module HTTP;

export
{
    redef enum Notice += { Sidejacking };

    # The time after expiring entries. This allows to roam users and later
    # reconnect from a different address without triggering a false positive.
    const cookie_expiration = 1 hr &redef;

    type cookie_info: record
    {
        url: pattern;   # URL pattern matched against Host header.
        pat: pattern;   # Cookie keys that define the user session.
    };

    # List of cookie information per service (taken from Firesheep handlers).
    const cookie_list: table[string] of cookie_info =
    {
        ["Amazon"]       = [$url=/amazon.com/, $pat=/x-main/],
        ["Basecamp"]     = [$url=/basecamphq.com/, 
                            $pat=/_basecamp_session|session_token/],
        ["bit.ly"]       = [$url=/bit.ly/, $pat=/user/],
        ["Cisco"]        = [$url=/cisco.com/, $pat=/SMIDENTITY/],
        ["CNET"]         = [$url=/cnet.com/, $pat=/urs_sessionId/],
        ["Dropbox"]      = [$url=/dropbox.com/, $pat=/lid/],
        ["Enom"]         = [$url=/enom.com/, $pat=/OatmealCookie|EmailAddress/],
        ["Evernote"]     = [$url=/evernote.com/, $pat=/auth/],
        ["Facebook"]     = [$url=/facebook.com/, $pat=/xs|c_user|sid/],
        ["Flickr"]       = [$url=/flickr.com/, $pat=/cookie_session/],
        ["Foursquare"]   = [$url=/foursquare.com/, $pat=/ext_id|XSESSIONID/],
        ["GitHub"]       = [$url=/github.com/, $pat=/_github_ses/],
        ["Google"]       = [$url=/google.com/, $pat=/NID|SID|HSID|PREF/],
        ["Gowalla"]      = [$url=/gowalla.com/, $pat=/__utma/],
        ["Hacker News"]  = [$url=/news.ycombinator.com/, $pat=/user/],
        ["Harvest"]      = [$url=/harvestapp.com/, $pat=/_enc_sess/],
        ["NY Times"]     = [$url=/nytimes.com/, $pat=/NYT-s|nyt-d/],
        ["Pivotal Tracker"] = [$url=/pivotaltracker.com/, 
                            $pat=/_myapp_session/],
        ["Slicehost"]    = [$url=/manage.slicehost.com/,
                            $pat=/_coach_session_id/],
        ["tumblr"]       = [$url=/tumblr.com/, $pat=/pfp/],
        ["Twitter"]      = [$url=/twitter.com/, $pat=/_twitter_sess/],
        ["Yahoo"]        = [$url=/yahoo.com/, $pat=/T|Y/],
        ["Yelp"]         = [$url=/yelp.com/, $pat=/__utma/],
        ["Windows Live"] = [$url=/live.com/, 
                            $pat=/MSP(Prof|Auth)|RPSTAuth|NAP/],
        ["Wordpress"]    = [$url=/wordpress.com/, 
                            $pat=/wordpress_[0-9a-fA-F]+/]
    } &redef;
}

# Map cookies to users, who are defined as a (address, user-agent) pair.
global cookies: table[string] of set[addr,string] 
    &write_expire = cookie_expiration;

# Create a unique user session identifier based on a pattern of cookie keys.
function sessionize(cookie: string, keys: pattern) : string
{
    local id = "";
    local fields = split(cookie, /; /);
    for (i in fields)
    {
        local s = split1(fields[i], /=/);
        if (keys in s[1])
            id += s[2];
    }

    return id;
}

event http_all_headers(c: connection, is_orig: bool, hlist: mime_header_list)
{
    if (! is_orig)
        return;

    local cookie = "";
    local ua = "";
    local host = "";
    for (i in hlist)
    {
        local hdr = hlist[i]$name;
        local value = hlist[i]$value;
        if (hdr == "COOKIE")
            cookie = value;
        else if (hdr == "USER-AGENT")
            ua = value;
        else if (hdr == "HOST")
            host = to_lower(value);
    }

    if (cookie == "")
        return;

    # Restrict ourselves to a subset of cookie keys that define a user session.
    local id = "";
    local desc = "";
    if (host != "")
        for (k in cookie_list)
        {
            local info = cookie_list[k];
            if (info$url in host)
            {
                id = sessionize(cookie, info$pat);
                desc = k;
                break;
            }
        }

    if (id == "")
        id = cookie;

    if (id !in cookies)
        cookies[id] = set() &mergeable;

    local client = c$id$orig_h;
    add cookies[id][client, ua];

    if (|cookies[id]| <= 1)
        return;

	local s = lookup_http_request_stream(c);
	desc = (desc == "" ? "" : fmt("%s ", desc));
    NOTICE([$note=Sidejacking, $src=client,
            $msg=fmt("%ssession hijacked by %s (%d users/cookie)", desc, 
            client, |cookies[id]|)]);
}

From vern at icir.org  Sat Oct 30 13:52:00 2010
From: vern at icir.org (Vern Paxson)
Date: Sat, 30 Oct 2010 13:52:00 -0700
Subject: [Bro] set initializers (Re:  Bro scripts)
In-Reply-To: <20101029065615.GK16825@icsi.berkeley.edu> (Thu,
	28 Oct 2010 23:56:15 PDT).
Message-ID: <20101030205200.CE9D736A4F2@taffy.ICSI.Berkeley.EDU>

> expression, I'd preferably have a set[string], but this cannot be
> statically defined in a record, i.e.,
> 
>     ["Facebook"]     = [$url=/facebook.com/, $cookies={"xs", "c_user", "sid"}],
>                                                       ^^^^^^^^^^^^^^^^^^^^^^^
> appears not to be correct Bro syntax, because I think variable-size
> types inside records cannot be initialized statically. Is that correct?

You can construct sets using

	.... $cookies=set("xs", "c_user", "sid")

for example.

		Vern


From vern at icir.org  Sat Oct 30 13:52:03 2010
From: vern at icir.org (Vern Paxson)
Date: Sat, 30 Oct 2010 13:52:03 -0700
Subject: [Bro] time machine filesize issue
In-Reply-To: <AANLkTi=8sbBU-2zTJ=ti6fqOADSmvgEcK+Ke4e6T96_Q@mail.gmail.com>
	(Thu, 28 Oct 2010 17:40:06 CDT).
Message-ID: <20101030205203.A6FAC36A4F2@taffy.ICSI.Berkeley.EDU>

> With filesize set at exactly 280g (279g does not produce the problem)
> tm will create one disk fifo file per packet in the workdir for each
> evicted packet with a disk setting of 1000g.  I am only using one
> default class for "all."

That sounds like something is wrapping and going negative at the 2^38 barrier.

		Vern


From gregor at icir.org  Sun Oct 31 15:38:46 2010
From: gregor at icir.org (Gregor Maier)
Date: Sun, 31 Oct 2010 15:38:46 -0700
Subject: [Bro] NAT detection (was: Re:  Bro scripts)
In-Reply-To: <AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org>	<20101029005936.GJ16825@icsi.berkeley.edu>
	<AANLkTimYBaqHZ=mYzcz2Wy5y-=kk31vorGW86sKkfZYP@mail.gmail.com>
Message-ID: <4CCDEFF6.7030002@icir.org>

Hi,

I've played around with NAT detection based on user-agent strings and IP
TTL.
See http://www.icir.org/gregor/papers/gregor-phd.pdf, Chapter 4


cu
gregor

On 10/28/10 18:48 , Martin Holste wrote:
> That's pretty cool!  I do have one suggestion, though:  Instead of
> tracking by IP, how about one cookie per user agent?  That will help
> catch the side jacking when used under a NAT.
> 
> On Thursday, October 28, 2010, Matthias Vallentin <vallentin at icir.org> wrote:
>>> I'm doing work on Bro's policy scripts for the next release and I want
>>> to find policy scripts floating around that can be shared and any
>>> helpful code snippets.  Anything you can contribute would be greatly
>>> appreciated, thanks!
>>
>> The whole buzz about Firesheep caused me to hack up a sidejacking
>> detector. I haven't tested it because I literally wrote it 5 minutes
>> ago.
>>
>>    Matthias
>>
>> Here is the code:
>>
>>     @load http-request
>>     @load http-reply
>>
>>     module HTTP;
>>
>>     export
>>     {
>>         redef enum Notice += { CookieReuse };
>>
>>         # Number of cookies per client.
>>         const max_cookies = 1 &redef;
>>
>>         # The time after when we expiring entries.
>>         const cookie_expiration = 1 hr &redef;
>>     }
>>
>>
>>     # Count the number of cookies per client.
>>     global cookies: table[string] of set[addr] &write_expire = cookie_expiration;
>>
>>     event http_header(c: connection, is_orig: bool, name: string, value: string)
>>     {
>>         # We are only looking for session IDs in the client cookie header.
>>         if (! (is_orig && name == /[cC][oO][oO][kK][iI][eE]/))
>>             return;
>>
>>         local client = c$id$orig_h;
>>         if (value !in cookies)
>>             cookies[value] = set();
>>         else
>>             add cookies[value][client];
>>
>>         if (|cookies[value]| <= max_cookies)
>>             return;
>>
>>         local s = lookup_http_request_stream(c);
>>         NOTICE([$note=CookieReuse, $src=client,
>>                 $msg=fmt("potential sidejacking by %s: cookie used by %d addresses",
>>                 client, |cookies[value]|)]);
>>     }
>> _______________________________________________
>> Bro mailing list
>> bro at bro-ids.org
>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro
>>
> 
> _______________________________________________
> Bro mailing list
> bro at bro-ids.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro


-- 
Gregor Maier                                             gregor at icir.org
Int. Computer Science Institute (ICSI)          gregor at icsi.berkeley.edu
1947 Center St., Ste. 600                    http://www.icir.org/gregor/
Berkeley, CA 94704
USA