From sstattla at gmail.com Wed Oct 6 11:37:58 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Wed, 06 Oct 2010 11:37:58 -0700 Subject: [Bro] Understanding the event generation and handling Message-ID: <4CACC206.2050505@gmail.com> Hi, I've been looking at the Bro documentation and source code recently. I need to get into lower-level details and looking at Source code is not helping me. Specifically, I need to get to the logic of- 1. Event generation: How does Bro know which all events to raise by looking at a particular packet? I have a basic understanding of the class hierarchy, but I don't know where to look for the code that decides which specific Application layer analyzer object to create by looking at the Application Layer header/signature of the incoming packet. 2. Event handling: It seems that an event's information is stored in an object and all events are queued in an Event Manager as they are created. After every packet is processed, this queue of events is drained (thus following a single-threaded model) and the events are sent to a Serializer. I found the serialization code hard to understand so I don't know the logic of how an event-handler (interpreter?) decides which event belongs to it. I'd really like to know the mechanism in here. Can someone please suggest which debugger to use and how, so that I can step-by-step understand the event-engine? Thank you, Sunjeet Singh From vern at icir.org Wed Oct 6 17:00:08 2010 From: vern at icir.org (Vern Paxson) Date: Wed, 06 Oct 2010 17:00:08 -0700 Subject: [Bro] Understanding the event generation and handling In-Reply-To: <4CACC206.2050505@gmail.com> (Wed, 06 Oct 2010 11:37:58 PDT). Message-ID: <20101007000008.63B3336A422@taffy.ICSI.Berkeley.EDU> > Specifically, I need to get to the logic of- > 1. Event generation: How does Bro know which all events to raise by > looking at a particular packet? There is a tree of analyzers that's traversed (perhaps taking multiple branches at any given point). > I have a basic understanding of the > class hierarchy, but I don't know where to look for the code that > decides which specific Application layer analyzer object to create by > looking at the Application Layer header/signature of the incoming packet. The architecture here is described in the paper: http://www.icir.org/robin/papers/usenix06.pdf If you are looking for specific details regarding names of classes/methods, etc., then you'll probably have to wait until Robin comes back from vacation in a couple of weeks. > 2. Event handling: It seems that an event's information is stored in an > object and all events are queued in an Event Manager as they are > created. Correct. > After every packet is processed, this queue of events is > drained (thus following a single-threaded model) and the events are sent > to a Serializer. I found the serialization code hard to understand so I Ignore the serializer. It's there for things like communication between multiple Bro processes. > Can someone please suggest which debugger to use and how, so that I can > step-by-step understand the event-engine? Well, I use gdb, and if I must, I start with invocations of NetSessions::NextPacket . If you want to sketch your particular goal, that might help with giving you more focussed advice. Vern From sstattla at gmail.com Wed Oct 6 17:37:39 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Wed, 06 Oct 2010 17:37:39 -0700 Subject: [Bro] Understanding the event generation and handling In-Reply-To: <20101007000008.63B3336A422@taffy.ICSI.Berkeley.EDU> References: <20101007000008.63B3336A422@taffy.ICSI.Berkeley.EDU> Message-ID: <4CAD1653.4010902@gmail.com> Hi Vern, > The architecture here is described in the paper: > > http://www.icir.org/robin/papers/usenix06.pdf > Thanks! I'll take a look. > Well, I use gdb, and if I must, I start with invocations of > NetSessions::NextPacket . > This is helpful. > If you want to sketch your particular goal, that might help with giving > you more focussed advice. > I'm interested in Bro in general, but right now I'd be interested to know details about how event handling was implemented in Bro. So for every event from the event queue, how many handlers is it matched against for the right handlers to be invoked? All?(Probably not) Could you please shed some light on the details here? Do you think there could be scope for optimization? Thank you, Sunjeet Singh From vern at icir.org Wed Oct 6 17:42:15 2010 From: vern at icir.org (Vern Paxson) Date: Wed, 06 Oct 2010 17:42:15 -0700 Subject: [Bro] Understanding the event generation and handling In-Reply-To: <4CAD1653.4010902@gmail.com> (Wed, 06 Oct 2010 17:37:39 PDT). Message-ID: <20101007004215.8E31F36A422@taffy.ICSI.Berkeley.EDU> > So for every event from the event queue, how many handlers is it matched > against for the right handlers to be invoked? There's no matching at all. Rather, when policy scripts define new event handlers, they're directly associated with the name of the event. So when the event engine generates event_XXX, there's already (scripting) code associated with a global variable named event_XXX, and that's executed directly. > Do you think there > could be scope for optimization? No. Where optimization would prove fruitful (but hard) is for the script interpreter. Vern From sstattla at gmail.com Thu Oct 7 10:47:06 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Thu, 07 Oct 2010 10:47:06 -0700 Subject: [Bro] Filtering based on port-number Message-ID: <4CAE079A.5070505@gmail.com> Hi, The Bro Analyzers operate on the principle that port number is not a good indicator of protocol. But the filtering step does exactly the opposite. For example, the filter applied when the default brolite.bro policy file is used is- ((((((((((port telnet or tcp port 513) or (tcp[13] & 7 != 0)) or (tcp dst port 80 or tcp dst port 8080 or tcp dst port 8000)) or (tcp src port 80 or tcp src port 8080 or tcp src port 8000)) or (port 111)) or ((ip[6:2] & 0x3fff != 0) and tcp)) or (udp port 69)) or (port 6666)) or (tcp port smtp or tcp port 587)) or (port ftp)) or (port 6667) Thanks to the filtering step, 1. Bro will analyze some traffic that didn't belong to any of the 'relevant' protocols until it realizes that it can safely be discarded, and 2. Bro will not analyze traffic that belonged to one of the relevant protocols because it was filtered out for not being used on the standard port. Is this true? And if so, is this an okay side-effect to have of the filtering step? Thank you, Sunjeet Singh From redlamb19 at gmail.com Thu Oct 7 12:41:00 2010 From: redlamb19 at gmail.com (Peter Erickson) Date: Thu, 7 Oct 2010 14:41:00 -0500 Subject: [Bro] Filtering based on port-number In-Reply-To: <4CAE079A.5070505@gmail.com> References: <4CAE079A.5070505@gmail.com> Message-ID: <20101007194059.GB4798@does.not.exist> I thought the same thing when I first started looking at Bro and it's dynamic protocol detection (dpd) about 2 months ago. Take a look at the dpd wiki page which gives a good description of how it works. It also states: when loading dpd you may need to change the filter to include all packets, e.g. on the command line: bro -f "tcp or udp or icmp" ... ** Sunjeet Singh [2010-10-07 10:47:06 -0700] ** > Hi, > > The Bro Analyzers operate on the principle that port number is not a > good indicator of protocol. But the filtering step does exactly the > opposite. > > For example, the filter applied when the default brolite.bro policy file > is used is- > ((((((((((port telnet or tcp port 513) or (tcp[13] & 7 != 0)) or (tcp > dst port 80 or tcp dst port 8080 or tcp dst port 8000)) or (tcp src port > 80 or tcp src port 8080 or tcp src port 8000)) or (port 111)) or > ((ip[6:2] & 0x3fff != 0) and tcp)) or (udp port 69)) or (port 6666)) or > (tcp port smtp or tcp port 587)) or (port ftp)) or (port 6667) > > Thanks to the filtering step, > 1. Bro will analyze some traffic that didn't belong to any of the > 'relevant' protocols until it realizes that it can safely be discarded, and > 2. Bro will not analyze traffic that belonged to one of the relevant > protocols because it was filtered out for not being used on the standard > port. > > Is this true? And if so, is this an okay side-effect to have of the > filtering step? > From seth at icir.org Thu Oct 7 12:48:12 2010 From: seth at icir.org (Seth Hall) Date: Thu, 7 Oct 2010 15:48:12 -0400 Subject: [Bro] Filtering based on port-number In-Reply-To: <20101007194059.GB4798@does.not.exist> References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist> Message-ID: On Oct 7, 2010, at 3:41 PM, Peter Erickson wrote: > when loading dpd you may need to change the filter to include all > packets, e.g. on the command line: > bro -f "tcp or udp or icmp" ... You can also change the filter at the script level list this.. redef capture_filters += { ["all-ip-traffic"] = "ip" }; .Seth From sstattla at gmail.com Thu Oct 7 14:33:53 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Thu, 07 Oct 2010 14:33:53 -0700 Subject: [Bro] Filtering based on port-number In-Reply-To: <20101007194059.GB4798@does.not.exist> References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist> Message-ID: <4CAE3CC1.5060007@gmail.com> > when loading dpd you may need to change the filter to include all > packets, e.g. on the command line: > bro -f "tcp or udp or icmp" ... > Okay, so it makes sense to use capture_filter as-it-is when you are not using DPD; and to disable capture_filter (using "bro -f") if you are using DPD. In the latter case, you end up analyzing all packets which causes an extra performance cost of about 13.8% [with given parameters, Section 6.1, USENIX'06 paper]. The same section of the paper also says that the runtime of the Bro system exceeds the duration of the trace, indicating that we require "multiple NIDS instances in live operation". "Multiple NIDS instances in live operation"- has this been discussed anywhere else? With the filter disabled, this would be very useful. Is it as simple as splitting up your policy file among different machines running Bro or is there more to it? Thank you, Peter. Sunjeet Singh From redlamb19 at gmail.com Fri Oct 8 07:59:38 2010 From: redlamb19 at gmail.com (Peter Erickson) Date: Fri, 8 Oct 2010 09:59:38 -0500 Subject: [Bro] Filtering based on port-number In-Reply-To: <4CAE3CC1.5060007@gmail.com> References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist> <4CAE3CC1.5060007@gmail.com> Message-ID: <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net> >> when loading dpd you may need to change the filter to include all >> packets, e.g. on the command line: >> bro -f "tcp or udp or icmp" ... >> > Okay, so it makes sense to use capture_filter as-it-is when you are not > using DPD; and to disable capture_filter (using "bro -f") if you are > using DPD. In the latter case, you end up analyzing all packets which > causes an extra performance cost of about 13.8% [with given parameters, > Section 6.1, USENIX'06 paper]. > > The same section of the paper also says that the runtime of the Bro > system exceeds the duration of the trace, indicating that we require > "multiple NIDS instances in live operation". > > "Multiple NIDS instances in live operation"- has this been discussed > anywhere else? With the filter disabled, this would be very useful. Is > it as simple as splitting up your policy file among different machines > running Bro or is there more to it? Someone else can correct me if I'm wrong, but I think that you are needing to setup a clustered environment with managers, proxies, and workers. The user manual briefly mentions something about this in the installation section, but my limited understanding of how it works comes from reading the scripts located at $BROHOME/share/broctl. My use of bro is strictly for offline processing so I have yet to really pay attention to it other than starting bro in standalone mode. From sstattla at gmail.com Fri Oct 8 08:08:24 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Fri, 08 Oct 2010 08:08:24 -0700 Subject: [Bro] Filtering based on port-number In-Reply-To: <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net> References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist> <4CAE3CC1.5060007@gmail.com> <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net> Message-ID: <4CAF33E8.2040405@gmail.com> I'm looking into it. Thanks for your help Peter. Sunjeet Singh On 10-10-08 07:59 AM, Peter Erickson wrote: > >>> when loading dpd you may need to change the filter to include all >>> packets, e.g. on the command line: >>> bro -f "tcp or udp or icmp" ... >>> >> Okay, so it makes sense to use capture_filter as-it-is when you are not >> using DPD; and to disable capture_filter (using "bro -f") if you are >> using DPD. In the latter case, you end up analyzing all packets which >> causes an extra performance cost of about 13.8% [with given parameters, >> Section 6.1, USENIX'06 paper]. >> >> The same section of the paper also says that the runtime of the Bro >> system exceeds the duration of the trace, indicating that we require >> "multiple NIDS instances in live operation". >> >> "Multiple NIDS instances in live operation"- has this been discussed >> anywhere else? With the filter disabled, this would be very useful. Is >> it as simple as splitting up your policy file among different machines >> running Bro or is there more to it? > > Someone else can correct me if I'm wrong, but I think that you are > needing to setup a clustered environment with managers, proxies, and > workers. The user manual briefly mentions something about this in the > installation section, but my limited understanding of how it works > comes from reading the scripts located at $BROHOME/share/broctl. My > use of bro is strictly for offline processing so I have yet to really > pay attention to it other than starting bro in standalone mode. From seth at icir.org Fri Oct 8 08:33:40 2010 From: seth at icir.org (Seth Hall) Date: Fri, 8 Oct 2010 11:33:40 -0400 Subject: [Bro] Filtering based on port-number In-Reply-To: <4CAF33E8.2040405@gmail.com> References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist> <4CAE3CC1.5060007@gmail.com> <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net> <4CAF33E8.2040405@gmail.com> Message-ID: <0B6A27B4-26BA-4180-BF07-050855AF0553@icir.org> The best documentation for this can currently be found here: http://www.icir.org/robin/bro-cluster/ .Seth On Oct 8, 2010, at 11:08 AM, Sunjeet Singh wrote: > I'm looking into it. Thanks for your help Peter. > > Sunjeet Singh > > > On 10-10-08 07:59 AM, Peter Erickson wrote: >> >>>> when loading dpd you may need to change the filter to include all >>>> packets, e.g. on the command line: >>>> bro -f "tcp or udp or icmp" ... >>>> >>> Okay, so it makes sense to use capture_filter as-it-is when you are not >>> using DPD; and to disable capture_filter (using "bro -f") if you are >>> using DPD. In the latter case, you end up analyzing all packets which >>> causes an extra performance cost of about 13.8% [with given parameters, >>> Section 6.1, USENIX'06 paper]. >>> >>> The same section of the paper also says that the runtime of the Bro >>> system exceeds the duration of the trace, indicating that we require >>> "multiple NIDS instances in live operation". >>> >>> "Multiple NIDS instances in live operation"- has this been discussed >>> anywhere else? With the filter disabled, this would be very useful. Is >>> it as simple as splitting up your policy file among different machines >>> running Bro or is there more to it? >> >> Someone else can correct me if I'm wrong, but I think that you are >> needing to setup a clustered environment with managers, proxies, and >> workers. The user manual briefly mentions something about this in the >> installation section, but my limited understanding of how it works >> comes from reading the scripts located at $BROHOME/share/broctl. My >> use of bro is strictly for offline processing so I have yet to really >> pay attention to it other than starting bro in standalone mode. > > _______________________________________________ > Bro mailing list > bro at bro-ids.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro From sstattla at gmail.com Fri Oct 8 08:37:15 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Fri, 08 Oct 2010 08:37:15 -0700 Subject: [Bro] Filtering based on port-number In-Reply-To: <0B6A27B4-26BA-4180-BF07-050855AF0553@icir.org> References: <4CAE079A.5070505@gmail.com> <20101007194059.GB4798@does.not.exist> <4CAE3CC1.5060007@gmail.com> <20101008095938.vh3h4k3xwcwok8s8@imp.redlamb.net> <4CAF33E8.2040405@gmail.com> <0B6A27B4-26BA-4180-BF07-050855AF0553@icir.org> Message-ID: <4CAF3AAB.9070501@gmail.com> Got it! Thanks Seth, Sunjeet Singh On 10-10-08 08:33 AM, Seth Hall wrote: > The best documentation for this can currently be found here: > > http://www.icir.org/robin/bro-cluster/ > > .Seth > > On Oct 8, 2010, at 11:08 AM, Sunjeet Singh wrote: > >> I'm looking into it. Thanks for your help Peter. >> >> Sunjeet Singh >> >> >> On 10-10-08 07:59 AM, Peter Erickson wrote: >>>>> when loading dpd you may need to change the filter to include all >>>>> packets, e.g. on the command line: >>>>> bro -f "tcp or udp or icmp" ... >>>>> >>>> Okay, so it makes sense to use capture_filter as-it-is when you are not >>>> using DPD; and to disable capture_filter (using "bro -f") if you are >>>> using DPD. In the latter case, you end up analyzing all packets which >>>> causes an extra performance cost of about 13.8% [with given parameters, >>>> Section 6.1, USENIX'06 paper]. >>>> >>>> The same section of the paper also says that the runtime of the Bro >>>> system exceeds the duration of the trace, indicating that we require >>>> "multiple NIDS instances in live operation". >>>> >>>> "Multiple NIDS instances in live operation"- has this been discussed >>>> anywhere else? With the filter disabled, this would be very useful. Is >>>> it as simple as splitting up your policy file among different machines >>>> running Bro or is there more to it? >>> Someone else can correct me if I'm wrong, but I think that you are >>> needing to setup a clustered environment with managers, proxies, and >>> workers. The user manual briefly mentions something about this in the >>> installation section, but my limited understanding of how it works >>> comes from reading the scripts located at $BROHOME/share/broctl. My >>> use of bro is strictly for offline processing so I have yet to really >>> pay attention to it other than starting bro in standalone mode. >> _______________________________________________ >> Bro mailing list >> bro at bro-ids.org >> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro From sstattla at gmail.com Fri Oct 8 16:39:22 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Fri, 08 Oct 2010 16:39:22 -0700 Subject: [Bro] Multi-threading Message-ID: <4CAFABAA.30105@gmail.com> Hi, Can someone please comment on the current status of multi-threading in Bro? I would be interested in doing some work here. I've been reading a bit about it at- http://blog.securitymonks.com/2010/08/26/three-little-idsips-engines-build-their-open-source-solutions/ and http://www.google.ca/url?sa=t&source=web&cd=1&ved=0CBQQFjAA&url=http%3A%2F%2Fwww.bro-ids.org%2Fbro-workshop-2009%2Fslides%2FFutureWork.pdf&rct=j&q=bro%20ids%20multithreading&ei=gquvTK6KIIa-sAPT8viQDA&usg=AFQjCNGhsZ76_FKTpe3P-v40RgT1Ye36KA&sig2=y8oAyNcZ602kjuT1Ei2ytw&cad=rja Thank you, Sunjeet Singh From vern at icir.org Mon Oct 11 20:43:00 2010 From: vern at icir.org (Vern Paxson) Date: Mon, 11 Oct 2010 20:43:00 -0700 Subject: [Bro] Multi-threading In-Reply-To: <4CAFABAA.30105@gmail.com> (Fri, 08 Oct 2010 16:39:22 PDT). Message-ID: <20101012034300.B232436A421@taffy.ICSI.Berkeley.EDU> > Can someone please comment on the current status of multi-threading in > Bro? That will need to be Robin, as he's the one who's done all the work on this. However, he's on vacation for another week, and will no doubt face a major email backlog when he returns. Vern From zsmountain27 at gmail.com Tue Oct 12 10:55:32 2010 From: zsmountain27 at gmail.com (SONG ZHAO) Date: Tue, 12 Oct 2010 13:55:32 -0400 Subject: [Bro] Modify mac address Message-ID: Hi, I want to modify the mac address of network packets before or after the packets handled by bro. Could you tell me how to modify the mac address using bro? Do I need to revise the source code? Thanks, Song -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20101012/828dbd22/attachment.html From christian at icir.org Tue Oct 12 12:43:41 2010 From: christian at icir.org (Christian Kreibich) Date: Tue, 12 Oct 2010 12:43:41 -0700 Subject: [Bro] Modify mac address In-Reply-To: References: Message-ID: <1286912621.1919.158.camel@strangepork> On Tue, 2010-10-12 at 13:55 -0400, SONG ZHAO wrote: > Hi, > I want to modify the mac address of network packets before or after > the packets handled by bro. > Could you tell me how to modify the mac address using bro? Do I need > to revise the source code? You would likely have to revise source code, but without more context it's unclear whether Bro is a good choice for what you want to do. If all you want is Ethernet address rewriting, there are other tools that likely already do what you want. tcprewrite provides basic Ethernet address rewriting. For more flexibility, you could write a little Scapy script as shown below. As a last resort you could write a Netdude plugin that does what you need. map = {'00:50:da:53:8a:01': '11:22:33:44:55:66', '00:12:7f:eb:3b:cf': '77:88:99:aa:bb:cc'} for pkt in rdpcap('in.trace'): for key, val in map.items(): if pkt.src == key: pkt.src = val if pkt.dst == key: pkt.dst = val wrpcap('out.trace', pkts) -- Cheers, Christian From redlamb19 at gmail.com Tue Oct 12 16:38:18 2010 From: redlamb19 at gmail.com (Peter Erickson) Date: Tue, 12 Oct 2010 18:38:18 -0500 Subject: [Bro] http analyzer and de-obfuscating the payload Message-ID: <20101012233818.GA1484@does.not.exist> While writing a few policies to track an extremely basic malware "protocol" that sits on top of HTTP, I ran into a few questions that I haven't been able to find answers for. 1. Are binpac analyzers preferred over the hand-written one? From what I can tell, which may be wrong, the http binpac analyzer does not send a http_entity_data event so using http-extract-items is not possible. Is it possible to extract http items using the binpac analyzer or am I better off sticking with the hand-written one? 2. When processing events, i.e. http_message_done, is it possible to access the entire assembled stream without writing it to disk first? I have some malware traffic that I would like to analyze with bro, but the data is obfuscated within the http data section using layers of xor, compression, and encryption techniques. Ideally, I would use bro to de-obfuscate the streams and provide additional info in the log files instead of using python scripts after running bro. I have no problems writing the bifs (I've already created an xor one), but want to make sure the info is available if I do write them. 3. Along the same lines as #2, is the assembled stream available for connections that are not http? Any help is appreciated. Thanks in advance. From seth at icir.org Tue Oct 12 19:17:36 2010 From: seth at icir.org (Seth Hall) Date: Tue, 12 Oct 2010 22:17:36 -0400 Subject: [Bro] http analyzer and de-obfuscating the payload In-Reply-To: <20101012233818.GA1484@does.not.exist> References: <20101012233818.GA1484@does.not.exist> Message-ID: <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org> On Oct 12, 2010, at 7:38 PM, Peter Erickson wrote: > Is it possible to extract http items using the binpac analyzer or am I > better off sticking with the hand-written one? Binpac analyzers are preferred when writing new analyzers, but some of the binpac analyzers are not at feature parity with their handwritten counterparts (HTTP is the primary problem in this regard). For now, I recommend not using the --enable-binpac flag when doing HTTP analysis. > 2. When processing events, i.e. http_message_done, is it possible to > access the entire assembled stream without writing it to disk first? No. Generally when doing stream analysis with Bro you have two options. The best, if your analysis method allows it is to do the analysis in a streaming fashion with chunks of data as they become available. If your analysis method needs random access to the data, then you are probably best off writing to disk and kicking off an external process (from within Bro) once the stream is completed and the file is closed. The output of that analysis could then feed back into Bro using Broccoli. You typically don't want to try storing large streams in memory because it would be far too easy to use all available memory and crash Bro. Of course, if you are running Bro on tracefiles instead of live network interfaces that may not be a concern. > 3. Along the same lines as #2, is the assembled stream available for > connections that are not http? It depends on the protocol and the analyzer. If you search through the event.bif.bro file for "_data", that will point out analyzer events which likely are sending a stream of data. The analyzers which currently have _data events are: HTTP, SMTP, POP3, and MIME. Unfortunately some of the other obvious ones like SMB and NFS don't currently have _data events. We accept patches though if you'd like to add support for that. :) Is there a protocol or set of protocols in particular that you'd like to see supported with _data events? .Seth From redlamb19 at gmail.com Tue Oct 12 20:30:06 2010 From: redlamb19 at gmail.com (Peter Erickson) Date: Tue, 12 Oct 2010 22:30:06 -0500 Subject: [Bro] http analyzer and de-obfuscating the payload In-Reply-To: <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org> References: <20101012233818.GA1484@does.not.exist> <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org> Message-ID: <20101012223006.1qberfceecow4osc@imp.redlamb.net> On Tue Oct 12 21:17:36 2010, Seth Hall wrote: >> 2. When processing events, i.e. http_message_done, is it possible to >> access the entire assembled stream without writing it to disk first? > > No. Generally when doing stream analysis with Bro you have two > options. The best, if your analysis method allows it is to do the > analysis in a streaming fashion with chunks of data as they become > available. If your analysis method needs random access to the data, > then you are probably best off writing to disk and kicking off an > external process (from within Bro) once the stream is completed and > the file is closed. The output of that analysis could then feed > back into Bro using Broccoli. I didn't think of using broccoli to feed it back into the system. I'll have to reconsider my current setup to see if that makes sense. It works now without it, but there is definitely a benefit of having additional information within bro's log files. > You typically don't want to try storing large streams in memory > because it would be far too easy to use all available memory and > crash Bro. Of course, if you are running Bro on tracefiles instead > of live network interfaces that may not be a concern. All the analysis that I have been (and will be doing) is with tracefiles on a machine that is not connected to a network. I figured that there were chances that I could run out of memory, but was hoping that the memory would be released once the connection was terminated. I did not think about using a table of strings to keep the data... guess I was thinking too deep. >> 3. Along the same lines as #2, is the assembled stream available for >> connections that are not http? > > It depends on the protocol and the analyzer. If you search through > the event.bif.bro file for "_data", that will point out analyzer > events which likely are sending a stream of data. The analyzers > which currently have _data events are: HTTP, SMTP, POP3, and MIME. > Unfortunately some of the other obvious ones like SMB and NFS don't > currently have _data events. We accept patches though if you'd like > to add support for that. :) I figured that you would accept patches. It has been awhile since I've used C++, but hoping it will come back to me. I have spent a lot of time looking at the source code to better understand how bro works. I would love to see RDP and SSL decryption, but I know that those aren't easy tasks... doesnt mean I wont try eventually. > Is there a protocol or set of protocols in particular that you'd > like to see supported with _data events? I haven't seen anything yet, but I'm sure that I'll come across something eventually. Thanks for all the help. From seth at icir.org Wed Oct 13 07:06:32 2010 From: seth at icir.org (Seth Hall) Date: Wed, 13 Oct 2010 10:06:32 -0400 Subject: [Bro] http analyzer and de-obfuscating the payload In-Reply-To: <20101012223006.1qberfceecow4osc@imp.redlamb.net> References: <20101012233818.GA1484@does.not.exist> <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org> <20101012223006.1qberfceecow4osc@imp.redlamb.net> Message-ID: <84A19F9A-6723-49FD-B345-148E5E9CFB34@icir.org> On Oct 12, 2010, at 11:30 PM, Peter Erickson wrote: > I didn't think of using broccoli to feed it back into the system. I'll have to reconsider my current setup to see if that makes sense. It works now without it, but there is definitely a benefit of having additional information within bro's log files. It's especially useful when you're using Bro on live network because the information gained from the external analysis could feed back into Bro to change it's behavior if the same thing is seen again. As a personal exercise, I'm going to start including concrete examples when I talk about techniques in Bro. :) So, here's my concrete example... Bro identifies a Windows executable being downloaded over HTTP so it begins calculating an MD5 sum of the bytes being transferred. It could also save the file to disk. When the file is done being transferred, the on-disk filename could be sent off to an external process which grabs the file does something like run it through VirusTotal and returns the result of that scan to Bro. If the file is determined to be malicious an alarm could be raised about the initial transfer and the MD5 sum could be added to a set of malicious MD5 sums. The URL of the file could also be added to a set of URLs. In the future, if any host downloads a file with that MD5 sum or from the same URL then an alarm would automatically be raised without waiting for the external analysis to take place. This full scenario is not currently implemented in Bro, but things are lining up to make this sort of analysis possible. If you have ideas for analysis scenarios that you'd like to see implemented, I'd really like to hear them! > All the analysis that I have been (and will be doing) is with tracefiles on a machine that is not connected to a network. I figured that there were chances that I could run out of memory, but was hoping that the memory would be released once the connection was terminated. I did not think about using a table of strings to keep the data... guess I was thinking too deep. You could either keep a table of strings or concatenate the strings together as new data comes in. I'll include some examples here. Using these inputs... global a = "first string"; global b = "second string"; global output = ""; You can do this... global stuff: string_array = table(); stuff[|stuff|+1] = a; stuff[|stuff|+1] = b; output = cat_string_array(stuff); Or this... output = string_cat(a, b); > I figured that you would accept patches. It has been awhile since I've used C++, but hoping it will come back to me. I have spent a lot of time looking at the source code to better understand how bro works. I would love to see RDP and SSL decryption, but I know that those aren't easy tasks... doesnt mean I wont try eventually. Bro currently doesn't have any support for RDP but I think that a lot of the support for SSL decryption is already in place. I've haven't ever done it though so I don't know if it is completely there and working though. .Seth From vern at icir.org Wed Oct 13 12:59:50 2010 From: vern at icir.org (Vern Paxson) Date: Wed, 13 Oct 2010 12:59:50 -0700 Subject: [Bro] http analyzer and de-obfuscating the payload In-Reply-To: <89668F79-C244-4546-A0A6-51AC8E7139AA@icir.org> (Tue, 12 Oct 2010 22:17:36 EDT). Message-ID: <20101013195950.10ACC36A413@taffy.ICSI.Berkeley.EDU> > > 3. Along the same lines as #2, is the assembled stream available for > > connections that are not http? > > It depends on the protocol and the analyzer. Note, there are also generic tcp_contents() and udp_contents() events. They likewise return the stream piecemeal. Vern From vern at icir.org Wed Oct 13 13:01:20 2010 From: vern at icir.org (Vern Paxson) Date: Wed, 13 Oct 2010 13:01:20 -0700 Subject: [Bro] http analyzer and de-obfuscating the payload In-Reply-To: <84A19F9A-6723-49FD-B345-148E5E9CFB34@icir.org> (Wed, 13 Oct 2010 10:06:32 EDT). Message-ID: <20101013200120.339D236A429@taffy.ICSI.Berkeley.EDU> > Or this... > output = string_cat(a, b); One caveat is that the string_cat approach is essentially O(N^2) in the size of the reassembled stream, because it winds up repeatedly copying the entire string. Ideally we'd fix this under the hood, one fine day ... Vern From sstattla at gmail.com Mon Oct 18 10:31:13 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Mon, 18 Oct 2010 10:31:13 -0700 Subject: [Bro] Use of GPUs for signature matching? Message-ID: <4CBC8461.8010008@gmail.com> Bro currently follows a single-threaded model in which every incoming packet is first filtered, analyzed for protocol based on its signature (and not simply port-number) and then handled according to a user-defined policy for that protocol. While Bro provides mechanisms to distribute the processing of the handled policy events, the protocol analysis poses a performance bottleneck in that it might not be able to keep up with the speed of incoming packets. In Bro's signature matching engine, connections sometimes trigger more than one signature and so can not be immediately associated with a protocol. But as more connection packets arrive, a better decision about the protocol involved can be made. During this process, different protocol analyzers may be spawned and killed until finally the right protocol is arrived at. Regular expression matching is done here to match signatures. I believe that GPUs can be used here to perform parallel signature matching by different protocol analyzers, thus speeding up the protocol analysis phase. With this, Bro would be able to operate at a higher packet rate than it does now. If this is true, I would like to do this. I will appreciate if you could share your thoughts. Snort's packet processing throughput increased by 60% with the use of GPUs ( http://www.springerlink.com/content/b3m7662014272t8m/ ) and Suricata has plans to introduce GPUs ( http://blog.securitymonks.com/2010/08/26/three-little-idsips-engines-build-their- open-source-solutions/ ). Thank you, Sunjeet Singh From vallentin at icir.org Mon Oct 18 11:05:09 2010 From: vallentin at icir.org (Matthias Vallentin) Date: Mon, 18 Oct 2010 11:05:09 -0700 Subject: [Bro] Log rotation and /dev/null with broctl Message-ID: <20101018180509.GE403@icsi.berkeley.edu> I receive some unexplainable errors using broctl: 19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory 19 Oct 04:42:55 [output] 1287253800.000380 run-time error: rotate_file: can't move /dev/null to /dev/null.3123.1287253800.000380.tmp: File exists 19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory 19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-17//dev/null.00:00:00-00:00:00.gz: No such file or directory 19 Oct 04:42:55 [output] 1287340200.000090 run-time error: rotate_file: can't move /dev/null to /dev/null.3123.1287340200.000090.tmp: File exists 19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory My broctl.cfg is pretty standard, with the only big difference being the change of the log directory: LogDir = /home2/bro-logs This is also weird: % file /dev/null /dev/null: ASCII text % more /dev/null title It almost seems that broctl overwrote /dev/null. Does that make any sense? Matthias From jmellander at lbl.gov Mon Oct 18 11:15:06 2010 From: jmellander at lbl.gov (Jim Mellander) Date: Mon, 18 Oct 2010 11:15:06 -0700 Subject: [Bro] Log rotation and /dev/null with broctl In-Reply-To: <20101018180509.GE403@icsi.berkeley.edu> References: <20101018180509.GE403@icsi.berkeley.edu> Message-ID: On Mon, Oct 18, 2010 at 11:05 AM, Matthias Vallentin wrote: > > > > This is also weird: > > % file /dev/null > /dev/null: ASCII text > % more /dev/null > title > > It almost seems that broctl overwrote /dev/null. Does that make any > sense? > > seen this happen when redirection goes bad: Instead of rm my_file >/dev/null the redirection is accidentally missed: rm my_file /dev/null (obviously only works with privs in /dev) then the next process redirecting to /dev/null creates a text file. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20101018/0192e7ad/attachment.html From JAzoff at uamail.albany.edu Mon Oct 18 11:25:40 2010 From: JAzoff at uamail.albany.edu (Justin Azoff) Date: Mon, 18 Oct 2010 14:25:40 -0400 Subject: [Bro] Log rotation and /dev/null with broctl In-Reply-To: <20101018180509.GE403@icsi.berkeley.edu> References: <20101018180509.GE403@icsi.berkeley.edu> Message-ID: <20101018182540.GG4105@datacomm.albany.edu> On Mon, Oct 18, 2010 at 02:05:09PM -0400, Matthias Vallentin wrote: > I receive some unexplainable errors using broctl: > > 19 Oct 04:42:55 [output] /usr/local/share/broctl/scripts/archive-log: line 49: /home2/bro-logs/2010-10-16//dev/null.07:52:18-00:00:00.gz: No such file or directory Do you have open_log_file("/dev/null") somewhere in one of your policy scripts? I don't think that sort of thing works, instead you need to immediately close a file after opening it... -- -- Justin Azoff -- Network Security & Performance Analyst From vallentin at icir.org Mon Oct 18 11:40:29 2010 From: vallentin at icir.org (Matthias Vallentin) Date: Mon, 18 Oct 2010 11:40:29 -0700 Subject: [Bro] Log rotation and /dev/null with broctl In-Reply-To: <20101018182540.GG4105@datacomm.albany.edu> References: <20101018180509.GE403@icsi.berkeley.edu> <20101018182540.GG4105@datacomm.albany.edu> Message-ID: <20101018184029.GG403@icsi.berkeley.edu> > Do you have open_log_file("/dev/null") somewhere in one of your policy > scripts? Indeed, I could find the following # Save us some disk I/O. redef notice_file = open("/dev/null"); redef bro_alarm_file = open("/dev/null"); redef Weird::weird_file = open("/dev/null"); which I replaced with event bro_init() { close(notice_file); close(bro_alarm_file); close(Weird::weird_file); } to get rid of the error. Thanks for the hint. Matthias From robin at icir.org Mon Oct 18 12:12:47 2010 From: robin at icir.org (Robin Sommer) Date: Mon, 18 Oct 2010 12:12:47 -0700 Subject: [Bro] Log rotation and /dev/null with broctl In-Reply-To: <20101018184029.GG403@icsi.berkeley.edu> References: <20101018180509.GE403@icsi.berkeley.edu> <20101018182540.GG4105@datacomm.albany.edu> <20101018184029.GG403@icsi.berkeley.edu> Message-ID: <20101018191247.GT55971@icir.org> On Mon, Oct 18, 2010 at 11:40 -0700, Matthias Vallentin wrote: > to get rid of the error. Thanks for the hint. We should check for that. Can you file a ticket to remember it? Thanks, Robin -- Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From vallentin at icir.org Mon Oct 18 12:25:09 2010 From: vallentin at icir.org (Matthias Vallentin) Date: Mon, 18 Oct 2010 12:25:09 -0700 Subject: [Bro] Log rotation and /dev/null with broctl In-Reply-To: <20101018191247.GT55971@icir.org> References: <20101018180509.GE403@icsi.berkeley.edu> <20101018182540.GG4105@datacomm.albany.edu> <20101018184029.GG403@icsi.berkeley.edu> <20101018191247.GT55971@icir.org> Message-ID: <20101018192509.GH403@icsi.berkeley.edu> > We should check for that. Can you file a ticket to remember it? Done. Matthias From seth at icir.org Mon Oct 18 12:26:23 2010 From: seth at icir.org (Seth Hall) Date: Mon, 18 Oct 2010 15:26:23 -0400 Subject: [Bro] Log rotation and /dev/null with broctl In-Reply-To: <20101018191247.GT55971@icir.org> References: <20101018180509.GE403@icsi.berkeley.edu> <20101018182540.GG4105@datacomm.albany.edu> <20101018184029.GG403@icsi.berkeley.edu> <20101018191247.GT55971@icir.org> Message-ID: <204463F0-F51C-4AFF-8B6F-5A19A9BD8FD8@icir.org> On Oct 18, 2010, at 3:12 PM, Robin Sommer wrote: > On Mon, Oct 18, 2010 at 11:40 -0700, Matthias Vallentin wrote: > >> to get rid of the error. Thanks for the hint. > > We should check for that. Can you file a ticket to remember it? It would be good to have some good clarification on how *not* to print to log files. I've been doing the close() trick in my logging framework for a long time but you and Vern both agreed that using close() probably isn't the right way to do it. It works really well in this situation though because it does prevent remote printing as well as local printing. .Seth From robin at icir.org Mon Oct 18 12:33:06 2010 From: robin at icir.org (Robin Sommer) Date: Mon, 18 Oct 2010 12:33:06 -0700 Subject: [Bro] Endace support in use? Message-ID: <20101018193306.GA71746@icir.org> Bro currently comes with native support for Endace cards (i.e., using the Endace API directly, not via their libpcap-compatible interface). The support is enabled by configuring with --with-dag. As we're cleaning up the Bro distribution, we were wondering if anybody is using this functionality and would object seeing it removed? Robin -- Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From robin at icir.org Wed Oct 20 14:12:11 2010 From: robin at icir.org (Robin Sommer) Date: Wed, 20 Oct 2010 14:12:11 -0700 Subject: [Bro] Multi-threading In-Reply-To: <4CAFABAA.30105@gmail.com> References: <4CAFABAA.30105@gmail.com> Message-ID: <20101020211211.GC68831@icir.org> Sorry for the delay. On Fri, Oct 08, 2010 at 16:39 -0700, Sunjeet Singh wrote: > Can someone please comment on the current status of multi-threading in > Bro? I would be interested in doing some work here. We have a proof-of-concept implementation of a multi-threaded Bro. Even though still an early prototype, it already improves Bro's performance quite a bit on multi-core systems and demonstrates that the approach works quite well. However, this prototype still has a number of limitations and is not yet usable from an operational perspective. There are also a number of different routes we could go from here, which aren't fully clear yet in their specifics. For more background, the most current description of the prototype is here: http://www.icir.org/robin/papers/cc-multi-core-icast.pdf Section V. describes the parallelization approach, and Section VI. presents some preliminary measurements. (Section I-IV are on a more conceptual level; not all of that is directly reflected in Bro). A limiting factor for moving this forward right now is available time, so help and contributions would certainly be welcome. Is there anything specific you're thinking about? (I saw your mail about GPUs, will reply to that in a bit). Robin -- Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From robin at icir.org Wed Oct 20 14:20:45 2010 From: robin at icir.org (Robin Sommer) Date: Wed, 20 Oct 2010 14:20:45 -0700 Subject: [Bro] Use of GPUs for signature matching? In-Reply-To: <4CBC8461.8010008@gmail.com> References: <4CBC8461.8010008@gmail.com> Message-ID: <20101020212045.GD68831@icir.org> On Mon, Oct 18, 2010 at 10:31 -0700, Sunjeet Singh wrote: > I believe that GPUs can be used here to perform parallel signature > matching by different protocol analyzers, thus speeding up the protocol > analysis phase. That's generally right and, as the Snort work demonstrates, parallelizing signature matching across GPUs can indeed improve performance quite a bit. For Bro, however, improving signature performance is actually not that crucial as its main performance bottlenecks are elsewhere (the single most important bottleneck today is the script interpreter). Thus, while generally improving the performance of Bro's signature engine would certainly still be nice (and I appreciate your interest in helping with this!), I'm not sure it's actually worth spending the time that a solid GPU-based implementation would require. I'd be happy to provide you with some further thoughts on directions you could work on for improving Bro's performance. Write me a mail off-list if you're interested. Robin -- Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From mcflyingdp at gmail.com Fri Oct 22 10:11:00 2010 From: mcflyingdp at gmail.com (=?GB2312?B?RWx4INDH?=) Date: Sat, 23 Oct 2010 01:11:00 +0800 Subject: [Bro] Who has put BRO turn parallel system? Message-ID: <4CC1C5A4.8050108@gmail.com> Who has put BRO turn parallel system(concurrent system)? I'm a second-year university student from China.I'm participating in a project development about NIDS.The teacher suggested to find if someone put BRO turn parallel system(concurrent system). Thanks ....I'm very interesting about Nids and BRO.But It's hard to find any document about BRO(Almost no),so I must learn from mail list... If you have spare time. Trouble answering my this problem. From mcholste at gmail.com Fri Oct 22 10:46:38 2010 From: mcholste at gmail.com (Martin Holste) Date: Fri, 22 Oct 2010 12:46:38 -0500 Subject: [Bro] Use of GPUs for signature matching? In-Reply-To: <20101020212045.GD68831@icir.org> References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org> Message-ID: >For Bro, however, improving signature > performance is actually not that crucial as its main performance > bottlenecks are elsewhere (the single most important bottleneck > today is the script interpreter). Robin, can you elaborate on this a bit? I'm very surprised that pattern matching would not be the first bottleneck. With that, I've watched the debate fly back and forth between Marty Rausch (in Snort) and Victor Julien (in Suricata) on the pros and cons of multithreading and I'd like to hear your take. Marty's point was that multithreading leads to CPU cache inefficiency which incurs a penalty greater than the boost to the pattern matching in parallel and therefore suggests flow-pinned load-balancing for scaling. Do you have an opinion on the matter? Thanks, Martin From sstattla at gmail.com Fri Oct 22 11:56:46 2010 From: sstattla at gmail.com (Sunjeet Singh) Date: Fri, 22 Oct 2010 11:56:46 -0700 Subject: [Bro] Multi-threading In-Reply-To: <20101020211211.GC68831@icir.org> References: <4CAFABAA.30105@gmail.com> <20101020211211.GC68831@icir.org> Message-ID: <4CC1DE6E.6030202@gmail.com> Thanks for sharing the link to the paper, it made an interesting read. This paper does a great job of explaining the concepts involved, even for someone like myself who doesn't have a background in parallel computing. Clearly, an IDS architecture that separates protocol analysis and event handling can employ this technique to improve performance. And so this can be used for Bro. But, you'd need a working ANI. I don't know how recently this paper was written, but when we're talking about today, where does ANI fit in, in hardware, and if not implemented as custom hardware then as a small program running in a core " if a multicore fabric includes embedded network resources" (like UltraSPARC T2)? I couldn't figure out how recently this paper was written (2007-08?), and so while reading this paper I couldn't help but think about this very basic question- Today, if I'm using Bro as the Host-based IDS on my machine, and if I find that Bro is not being able to keep up with the incoming packet rate, what are some steps that I should take? Thank you, Sunjeet Singh On 10-10-20 2:12 PM, Robin Sommer wrote: > Sorry for the delay. > > On Fri, Oct 08, 2010 at 16:39 -0700, Sunjeet Singh wrote: > >> Can someone please comment on the current status of multi-threading in >> Bro? I would be interested in doing some work here. > We have a proof-of-concept implementation of a multi-threaded Bro. > Even though still an early prototype, it already improves Bro's > performance quite a bit on multi-core systems and demonstrates that > the approach works quite well. However, this prototype still has a > number of limitations and is not yet usable from an operational > perspective. There are also a number of different routes we could go > from here, which aren't fully clear yet in their specifics. > > For more background, the most current description of the prototype > is here: > > http://www.icir.org/robin/papers/cc-multi-core-icast.pdf > > Section V. describes the parallelization approach, and Section VI. > presents some preliminary measurements. (Section I-IV are on a more > conceptual level; not all of that is directly reflected in Bro). > > A limiting factor for moving this forward right now is available > time, so help and contributions would certainly be welcome. Is there > anything specific you're thinking about? (I saw your mail about > GPUs, will reply to that in a bit). > > Robin > From seth at icir.org Fri Oct 22 12:39:50 2010 From: seth at icir.org (Seth Hall) Date: Fri, 22 Oct 2010 15:39:50 -0400 Subject: [Bro] Multi-threading In-Reply-To: <4CC1DE6E.6030202@gmail.com> References: <4CAFABAA.30105@gmail.com> <20101020211211.GC68831@icir.org> <4CC1DE6E.6030202@gmail.com> Message-ID: <23648188-F68E-408F-9C4E-73BA60CAD492@icir.org> On Oct 22, 2010, at 2:56 PM, Sunjeet Singh wrote: > Today, if I'm using Bro as the Host-based IDS on my machine, and if I > find that Bro is not being able to keep up with the incoming packet > rate, what are some steps that I should take? I'm guessing you meant network based IDS (as opposed to Host-based)? Currently, if you are trying to scale Bro as a network IDS the most viable method is to use the cluster deployment using the BroControl utility. It's currently being used in production at a number of locations. For more documentation about BroControl and the cluster deployment you can refer to the following link. http://www.icir.org/robin/bro-cluster/README.html .Seth From gmhoward at gmail.com Mon Oct 25 20:15:11 2010 From: gmhoward at gmail.com (Gaspar Modelo-Howard) Date: Mon, 25 Oct 2010 23:15:11 -0400 Subject: [Bro] Remote reconfiguration of a Bro sensor In-Reply-To: <20101018193306.GA71746@icir.org> References: <20101018193306.GA71746@icir.org> Message-ID: <1288062911.11213.6.camel@kareem> Hello, Can someone please point to some info on how does Bro currently support to remotely reconfigure a sensor? Any example would also be appreciated. I want to configure Bro to allow remote reconfiguration of sensors without shutting down the sensor. One particular case I am interested in is telling a Bro sensor to include/exclude a .bro script while running. For example, a sensor starts with 'bro http' and then later is reconfigured to 'bro http ssh'. I briefly talked to Robin and Seth on regards to this so sorry to bring it up again. But seems like I missed some important pointers, can't find where/how to proceed with this. Have been successful sharing state between remote sensors, like bro-to-bro comm from 2009 workshop, but not doing remote reconfiguration. Many thanks, Gaspar From robin at icir.org Mon Oct 25 21:47:44 2010 From: robin at icir.org (Robin Sommer) Date: Mon, 25 Oct 2010 21:47:44 -0700 Subject: [Bro] Use of GPUs for signature matching? In-Reply-To: References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org> Message-ID: <20101026044744.GB37556@icir.org> On Fri, Oct 22, 2010 at 12:46 -0500, you wrote: > Robin, can you elaborate on this a bit? I'm very surprised that > pattern matching would not be the first bottleneck. The answer is quiet simple actually: Bro just doesn't do that much pattern matching. While it has a pattern engine similar to what Snort/Suricata are relying on, a typical Bro setup doesn't use it very much at all: typically there are just a few signatures configured, often just for doing dynamic protocol detection. Bro is doing a lot of other things instead, in particular deep stateful protocol analysis and execution of its analysis scripts. In particular the latter is getting more and more expensive compared to Bro's other components: scripts are becoming larger and more complex, they track more state, and they have to deal with more traffic to analyze. The script interpreter is a piece we haven't spend much time on optimizing yet (it's indeed still an *interpreter* ...), and it actually accounts for a large share of Bro's CPU (and also memory) footprint these days. Note that executing scripts written in Bro's language is much different from doing pattern matching; improving regexp performance is not going to help much at all with the scripts. That's quite different from Snort/Suricata obviously, which don't do much else than pattern mastching. > Marty's point was that multithreading leads to CPU cache > inefficiency which incurs a penalty greater than the boost to the > pattern matching in parallel and therefore suggests flow-pinned > load-balancing for scaling. Do you have an opinion on the matter? It's hard to answer that in a few sentences, but generally I agree that a flow-based load-balancing scheme is a reasonable approach for the lowest layer of the system. Many NIDS (includig Snort and Bro) do much of their work on a per-flow basis, so parallelzing at that granularity certainly makes a lot of sense and avoids communication overhead (and hence also cache issues). Generally, such a flow-based scheme can then be implemented either at the system/process level (i.e., running more than one instance of the NIDS, with a suitable frontend load-balancer splitting up the work, either externally or internally); or at the thread-level (multiple threads fed by a master thread). Conceptually, that doesn't make a lot of a difference, and the former is what we're doing with the Bro Cluster. Now, Snort has the "advantage" that such a simple flow-based scheme is pretty much all it needs to do for parallelizing. Because there's not much happening after the pattern matching step, there's also no need for further coordination between the instances/threads. For Bro, however, this is where things actually start to get interesting: since much of its CPU cycles are spent for the scripts, Amdahl's Law tells us that we need to parallelize the interpreter if we want to scale. Unfortunately, parallelizing the execution of a free-form Turing-complete language isn't exactly trivial ... Robin -- Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From robin at icir.org Mon Oct 25 21:52:36 2010 From: robin at icir.org (Robin Sommer) Date: Mon, 25 Oct 2010 21:52:36 -0700 Subject: [Bro] Multi-threading In-Reply-To: <4CC1DE6E.6030202@gmail.com> References: <4CAFABAA.30105@gmail.com> <20101020211211.GC68831@icir.org> <4CC1DE6E.6030202@gmail.com> Message-ID: <20101026045236.GC37556@icir.org> On Fri, Oct 22, 2010 at 11:56 -0700, you wrote: > Clearly, an IDS architecture that separates protocol analysis and event > handling can employ this technique to improve performance. And so this > can be used for Bro. But, you'd need a working ANI. That's right, but note that the ANI in the paper is a more powerful component than what we need for "just" parallelizing a passive NIDS (such as Bro). The latter primarily needs a load-balancer that distributes packets across threads in a predictable manner. In the most simple implemention (and in the current prototype) that's just another thread copying packets around, which is obviously not that great. A number of things come to mind to improve on that (as you already mention as well): an external load-balancer like what we use for the Bro Cluster; some decicated network processers can already do this internally; and, probably the best option of all, some of the new commodity NICs actually have the necessary functionality on board and can steer traffic directly to their target threads. Generally, I expect much of what we need here to become pretty much standard functionality in the near future. > I don't know how recently this paper was written, The paper has been growing over a while. :) The later parts were finished about a year ago, the earlier ones in 2007/8 alreday iirc. Robin -- Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org ICSI/LBNL * Fax +1 (510) 666-2956 * www.icir.org From mcholste at gmail.com Tue Oct 26 06:54:18 2010 From: mcholste at gmail.com (Martin Holste) Date: Tue, 26 Oct 2010 08:54:18 -0500 Subject: [Bro] Use of GPUs for signature matching? In-Reply-To: <20101026044744.GB37556@icir.org> References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org> <20101026044744.GB37556@icir.org> Message-ID: Ok, this makes a lot of sense now. So you're saying that for the few true pattern matching activities Bro has to do, there's plenty of CPU to spare, but for script execution such as going to time-machine, extracting files from pcap, etc., you're running out of CPU. So if you're running into a performance challenge with the scripting language, would you consider switching from the native Bro scripting language to an embedded interpreter from something like Perl, Python, or Lua? That in and of itself probably would hurt performance, but my guess is that it would take a lot less time to embed something and then multi-thread it then rolling your own from scratch. With the increase in number of CPU cores climbing exponentially, a small performance hit would probably be acceptable if it can be offset by running on multiple cores. I think a well-known script language would also be a lot less scary for newcomers to Bro and really increase its user base. On Mon, Oct 25, 2010 at 11:47 PM, Robin Sommer wrote: > > On Fri, Oct 22, 2010 at 12:46 -0500, you wrote: > >> Robin, can you elaborate on this a bit? ?I'm very surprised that >> pattern matching would not be the first bottleneck. > > The answer is quiet simple actually: Bro just doesn't do that much > pattern matching. While it has a pattern engine similar to what > Snort/Suricata are relying on, a typical Bro setup doesn't use it > very much at all: typically there are just a few signatures > configured, often just for doing dynamic protocol detection. > > Bro is doing a lot of other things instead, in particular deep > stateful protocol analysis and execution of its analysis scripts. In > particular the latter is getting more and more expensive compared to > Bro's other components: scripts are becoming larger and more > complex, they track more state, and they have to deal with more > traffic to analyze. The script interpreter is a piece we haven't > spend much time on optimizing yet (it's indeed still an > *interpreter* ...), and it actually accounts for a large share of > Bro's CPU (and also memory) footprint these days. > > Note that executing scripts written in Bro's language is much > different from doing pattern matching; improving regexp performance > is not going to help much at all with the scripts. That's quite > different from Snort/Suricata obviously, which don't do much else > than pattern mastching. > >> Marty's point was that multithreading leads to CPU cache >> inefficiency which incurs a penalty greater than the boost to the >> pattern matching in parallel and therefore suggests flow-pinned >> load-balancing for scaling. ?Do you have an opinion on the matter? > > It's hard to answer that in a few sentences, but generally I agree > that a flow-based load-balancing scheme is a reasonable approach for > the lowest layer of the system. Many NIDS (includig Snort and Bro) > do much of their work on a per-flow basis, so parallelzing at that > granularity certainly makes a lot of sense and avoids communication > overhead (and hence also cache issues). Generally, such a flow-based > scheme can then be implemented either at the system/process level > (i.e., running more than one instance of the NIDS, with a suitable > frontend load-balancer splitting up the work, either externally or > internally); or at the thread-level (multiple threads fed by a > master thread). Conceptually, that doesn't make a lot of a > difference, and the former is what we're doing with the Bro Cluster. > > Now, Snort has the "advantage" that such a simple flow-based scheme > is pretty much all it needs to do for parallelizing. Because there's > not much happening after the pattern matching step, there's also no > need for further coordination between the instances/threads. For > Bro, however, this is where things actually start to get > interesting: since much of its CPU cycles are spent for the scripts, > Amdahl's Law tells us that we need to parallelize the interpreter if > we want to scale. ?Unfortunately, parallelizing the execution of a > free-form Turing-complete language isn't exactly trivial ... > > Robin > > -- > Robin Sommer * Phone +1 (510) 722-6541 * robin at icir.org > ICSI/LBNL ? ?* Fax ? +1 (510) 666-2956 * ? www.icir.org > From seth at icir.org Tue Oct 26 07:33:57 2010 From: seth at icir.org (Seth Hall) Date: Tue, 26 Oct 2010 10:33:57 -0400 Subject: [Bro] Use of GPUs for signature matching? In-Reply-To: References: <4CBC8461.8010008@gmail.com> <20101020212045.GD68831@icir.org> <20101026044744.GB37556@icir.org> Message-ID: Hi Martin, On Oct 26, 2010, at 9:54 AM, Martin Holste wrote: > So if you're running into a performance challenge with the scripting > language, would you consider switching from the native Bro scripting > language to an embedded interpreter from something like Perl, Python, > or Lua? That in and of itself probably would hurt performance, but my > guess is that it would take a lot less time to embed something and > then multi-thread it then rolling your own from scratch. That likely not true. The performance hit would probably quite large with many of the dynamic languages. I don't know about Lua but with Perl and Python being untyped they do a lot of acrobatics whenever variables are created, accessed, and modified which doesn't work very with the soft realtime constraints that Bro needs to function within. > I think a well-known script language would > also be a lot less scary for newcomers to Bro and really increase its > user base. I think that every who start working with Bro has a point where they get frustrated with having to learn a new language (I know I did), but then after some time they start to recognize the reason that Bro has it's own language. The Bro policy script language is a large part of what makes Bro, Bro. :) It's a domain specific language for doing event analysis and Bro's core has been made to turn network traffic into a stream of events so that it would be possible to analyze it in this style. General purpose scripting languages would likely have to use strange syntaxes to get some of the features and functionality of the Bro language. What will likely increase Bro's user base in a big way is for Bro to do a lot of interesting detections out of the box. There's likely going to ever only be a fairly small proportion of users who would ever learn or heavily use the scripting language even if it were Python or Perl. More documentation is going to help too. :) .Seth From vern at icir.org Tue Oct 26 08:19:42 2010 From: vern at icir.org (Vern Paxson) Date: Tue, 26 Oct 2010 08:19:42 -0700 Subject: [Bro] Use of GPUs for signature matching? In-Reply-To: (Tue, 26 Oct 2010 08:54:18 CDT). Message-ID: <20101026151942.8F2C73137F9@taffy.ICSI.Berkeley.EDU> > for the few > true pattern matching activities Bro has to do, there's plenty of CPU > to spare Right. > but for script execution such as going to time-machine, > extracting files from pcap, etc., you're running out of CPU. Yes in general for script execution, though that usually doesn't involve the Time Mchine or pcap files. > So if you're running into a performance challenge with the scripting > language, would you consider switching from the native Bro scripting > language to an embedded interpreter from something like Perl, Python, > or Lua? No, because we view Bro's domain-specific language as a big plus. > With the > increase in number of CPU cores climbing exponentially, a small > performance hit would probably be acceptable if it can be offset by > running on multiple cores. Note, we have a major project on multicore network security analysis, which focuses on Bro. So this is definitely on our radar. Here, having a domain-specific language can be a significant win, since we can leverage particular semantics for optimization that we could't if we used a general interpreter. > I think a well-known script language would > also be a lot less scary for newcomers to Bro and really increase its > user base. I wonder if it's the particulars of the language. Bro's scripting language isn't itself that peculiar or hard to pick up. What gets harder is (1) the large set of predefined events, (2) langauge quirks in support of things like state management (but we'd need those anyway), (3) the lack of adequate "here's the overall model" and "here's the paradigm for XYZ" documentation - which we're definitely aiming to fix. Vern From mcholste at gmail.com Tue Oct 26 11:01:24 2010 From: mcholste at gmail.com (Martin Holste) Date: Tue, 26 Oct 2010 13:01:24 -0500 Subject: [Bro] Time Machine RAM usage question Message-ID: I've got a question on tm's RAM usage, and I was hoping someone could point me in the right direction: I'm trying to get as much duration as possible out of tm so that I can go back many hours or even days for packets. I have a lot of disk to throw at it, and a fair amount of RAM. The problem I'm running into is that when I move the conn_timeout up to 86400 but keep the mem settings low, tm still consumes a massive amount of RAM. I am concluding that the RAM usage must be the connection tables, and not the mem setting for the traffic class. Is there a way to allow tm to maximize for longevity? My understanding is that if I move the conn_timeout down, those packets will not be available for query. Thanks, Martin From JAzoff at uamail.albany.edu Tue Oct 26 11:14:05 2010 From: JAzoff at uamail.albany.edu (Justin Azoff) Date: Tue, 26 Oct 2010 14:14:05 -0400 Subject: [Bro] Time Machine RAM usage question In-Reply-To: References: Message-ID: <20101026181405.GG5900@datacomm.albany.edu> On Tue, Oct 26, 2010 at 02:01:24PM -0400, Martin Holste wrote: > I've got a question on tm's RAM usage, and I was hoping someone could > point me in the right direction: I'm trying to get as much duration > as possible out of tm so that I can go back many hours or even days > for packets. I have a lot of disk to throw at it, and a fair amount > of RAM. The problem I'm running into is that when I move the > conn_timeout up to 86400 but keep the mem settings low, tm still > consumes a massive amount of RAM. I don't think you need conn_timeout set that high. I use: conn_timeout 180; and then cutoff 5k; disk 4g; filesize 128m; mem 512m; works fine for the most part. -- -- Justin Azoff -- Network Security & Performance Analyst From vern at icir.org Tue Oct 26 12:43:42 2010 From: vern at icir.org (Vern Paxson) Date: Tue, 26 Oct 2010 12:43:42 -0700 Subject: [Bro] Time Machine RAM usage question In-Reply-To: <20101026181405.GG5900@datacomm.albany.edu> (Tue, 26 Oct 2010 14:14:05 EDT). Message-ID: <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> > I don't think you need conn_timeout set that high. Right. conn_timeout is how long to keep internal state when a connection is inactive; *not* how long to keep recorded connections lying around. Vern From mcholste at gmail.com Tue Oct 26 14:04:46 2010 From: mcholste at gmail.com (Martin Holste) Date: Tue, 26 Oct 2010 16:04:46 -0500 Subject: [Bro] Time Machine RAM usage question In-Reply-To: <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> References: <20101026181405.GG5900@datacomm.albany.edu> <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> Message-ID: That's what I originally thought. What was throwing me was when I would try to find packets any older than the cutoff, the queries would come up empty, the log showing something like "query not found in connection table." So I ran "show conn sample" to see the connections table, and the oldest connections were always at the cutoff. When I looked through the source code, it appeared that connections older than the cutoff were evicted from the connections table, but the query depended on the connections table to find the packets on disk/ram. On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson wrote: >> I don't think you need conn_timeout set that high. > > Right. ?conn_timeout is how long to keep internal state when a connection > is inactive; *not* how long to keep recorded connections lying around. > > ? ? ? ? ? ? ? ?Vern > From gregor at icir.org Tue Oct 26 17:13:32 2010 From: gregor at icir.org (Gregor Maier) Date: Tue, 26 Oct 2010 17:13:32 -0700 Subject: [Bro] Time Machine RAM usage question In-Reply-To: References: <20101026181405.GG5900@datacomm.albany.edu> <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> Message-ID: <4CC76EAC.7070404@icir.org> Hi, That sound's weird. I'm going to look into that. Which kind of query did you use? Can you maybe copy-paste a sample query plus the error message into an e-mail? cu Gregor On 10/26/10 14:04 , Martin Holste wrote: > That's what I originally thought. What was throwing me was when I > would try to find packets any older than the cutoff, the queries would > come up empty, the log showing something like "query not found in > connection table." So I ran "show conn sample" to see the connections > table, and the oldest connections were always at the cutoff. When I > looked through the source code, it appeared that connections older > than the cutoff were evicted from the connections table, but the query > depended on the connections table to find the packets on disk/ram. > > On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson wrote: >>> I don't think you need conn_timeout set that high. >> >> Right. conn_timeout is how long to keep internal state when a connection >> is inactive; *not* how long to keep recorded connections lying around. >> >> Vern >> > > _______________________________________________ > Bro mailing list > bro at bro-ids.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro -- Gregor Maier gregor at icir.org Int. Computer Science Institute (ICSI) gregor at icsi.berkeley.edu 1947 Center St., Ste. 600 http://www.icir.org/gregor/ Berkeley, CA 94704 USA From gregor at icir.org Tue Oct 26 17:45:35 2010 From: gregor at icir.org (Gregor Maier) Date: Tue, 26 Oct 2010 17:45:35 -0700 Subject: [Bro] Time Machine RAM usage question In-Reply-To: References: <20101026181405.GG5900@datacomm.albany.edu> <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> Message-ID: <4CC7762F.7030809@icir.org> Hi, looking at the source code, it seems that the message 'not found in connection table' is related to subscriptions (i.e., the query request that all future packets for this connections should be included in the query results. And subscriptions only work for connections that are currently active). So this message is ok. (Actually it is commented out in the current svn-snapshot. Did you uncomment it or was it in your version of the TM source code). As others already pointed out the conn_timeout is indeed the idle time until a connection is expired from the connection table (we only use the timeout to expire connections). Setting this to high value is counter-productive: the memory consumption increases significantly. Furthermore, long timeouts will reduce visibility. No new packets will be recorded for connections (actually 5-tuples) that aren't expired but have exceeded the cutoff. So long timeouts can be problematic in the case of 5-tuple reuse. To check your current retention times, you can check the classes.tm.log file. mem_dt and disk_dt will tell you how many seconds of packet data are currently retained in memory and on disk. Can you check whether the packets you want to retrieve fall into this time-frame? cu Gregor On 10/26/10 14:04 , Martin Holste wrote: > That's what I originally thought. What was throwing me was when I > would try to find packets any older than the cutoff, the queries would > come up empty, the log showing something like "query not found in > connection table." So I ran "show conn sample" to see the connections > table, and the oldest connections were always at the cutoff. When I > looked through the source code, it appeared that connections older > than the cutoff were evicted from the connections table, but the query > depended on the connections table to find the packets on disk/ram. > > On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson wrote: >>> I don't think you need conn_timeout set that high. >> >> Right. conn_timeout is how long to keep internal state when a connection >> is inactive; *not* how long to keep recorded connections lying around. >> >> Vern >> > > _______________________________________________ > Bro mailing list > bro at bro-ids.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro -- Gregor Maier gregor at icir.org Int. Computer Science Institute (ICSI) gregor at icsi.berkeley.edu 1947 Center St., Ste. 600 http://www.icir.org/gregor/ Berkeley, CA 94704 USA From mcholste at gmail.com Tue Oct 26 18:46:22 2010 From: mcholste at gmail.com (Martin Holste) Date: Tue, 26 Oct 2010 20:46:22 -0500 Subject: [Bro] Time Machine RAM usage question In-Reply-To: <4CC7762F.7030809@icir.org> References: <20101026181405.GG5900@datacomm.albany.edu> <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> <4CC7762F.7030809@icir.org> Message-ID: Thanks for looking into it. I blew away all of my buffers and indexes and restarted with more sane settings and queries seem to be behaving. I'm certainly not ruling out user error. In any case, thanks for your help, it's very appreciated. On Tue, Oct 26, 2010 at 7:45 PM, Gregor Maier wrote: > Hi, > > looking at the source code, it seems that the message 'not found in > connection table' is related to subscriptions (i.e., the query request > that all future packets for this connections should be included in the > query results. And subscriptions only work for connections that are > currently active). So this message is ok. (Actually it is commented out > in the current svn-snapshot. Did you uncomment it or was it in your > version of the TM source code). > > As others already pointed out the conn_timeout is indeed the idle time > until a connection is expired from the connection table (we only use the > timeout to expire connections). Setting this to high value is > counter-productive: the memory consumption increases significantly. > Furthermore, long timeouts will reduce visibility. No new packets will > be recorded for connections (actually 5-tuples) that aren't expired but > have exceeded the cutoff. So long timeouts can be problematic in the > case of 5-tuple reuse. > > To check your current retention times, you can check the classes.tm.log > file. mem_dt and disk_dt will tell you how many seconds of packet data > are currently retained in memory and on disk. Can you check whether the > packets you want to retrieve fall into this time-frame? > > > cu > Gregor > > > > > > On 10/26/10 14:04 , Martin Holste wrote: >> That's what I originally thought. ?What was throwing me was when I >> would try to find packets any older than the cutoff, the queries would >> come up empty, the log showing something like "query not found in >> connection table." ?So I ran "show conn sample" to see the connections >> table, and the oldest connections were always at the cutoff. ?When I >> looked through the source code, it appeared that connections older >> than the cutoff were evicted from the connections table, but the query >> depended on the connections table to find the packets on disk/ram. >> >> On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson wrote: >>>> I don't think you need conn_timeout set that high. >>> >>> Right. ?conn_timeout is how long to keep internal state when a connection >>> is inactive; *not* how long to keep recorded connections lying around. >>> >>> ? ? ? ? ? ? ? ?Vern >>> >> >> _______________________________________________ >> Bro mailing list >> bro at bro-ids.org >> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro > > > -- > Gregor Maier ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? gregor at icir.org > Int. Computer Science Institute (ICSI) ? ? ? ? ?gregor at icsi.berkeley.edu > 1947 Center St., Ste. 600 ? ? ? ? ? ? ? ? ? ?http://www.icir.org/gregor/ > Berkeley, CA 94704 > USA > From gregor at icir.org Tue Oct 26 19:52:51 2010 From: gregor at icir.org (Gregor Maier) Date: Tue, 26 Oct 2010 19:52:51 -0700 Subject: [Bro] Time Machine RAM usage question In-Reply-To: References: <20101026181405.GG5900@datacomm.albany.edu> <20101026194342.7A1AC3137F0@taffy.ICSI.Berkeley.EDU> <4CC7762F.7030809@icir.org> Message-ID: <4CC79403.5090108@icir.org> no worries. Just let me know if the error pops up again. cu gregor On 10/26/10 18:46 , Martin Holste wrote: > Thanks for looking into it. I blew away all of my buffers and indexes > and restarted with more sane settings and queries seem to be behaving. > I'm certainly not ruling out user error. In any case, thanks for > your help, it's very appreciated. > > On Tue, Oct 26, 2010 at 7:45 PM, Gregor Maier wrote: >> Hi, >> >> looking at the source code, it seems that the message 'not found in >> connection table' is related to subscriptions (i.e., the query request >> that all future packets for this connections should be included in the >> query results. And subscriptions only work for connections that are >> currently active). So this message is ok. (Actually it is commented out >> in the current svn-snapshot. Did you uncomment it or was it in your >> version of the TM source code). >> >> As others already pointed out the conn_timeout is indeed the idle time >> until a connection is expired from the connection table (we only use the >> timeout to expire connections). Setting this to high value is >> counter-productive: the memory consumption increases significantly. >> Furthermore, long timeouts will reduce visibility. No new packets will >> be recorded for connections (actually 5-tuples) that aren't expired but >> have exceeded the cutoff. So long timeouts can be problematic in the >> case of 5-tuple reuse. >> >> To check your current retention times, you can check the classes.tm.log >> file. mem_dt and disk_dt will tell you how many seconds of packet data >> are currently retained in memory and on disk. Can you check whether the >> packets you want to retrieve fall into this time-frame? >> >> >> cu >> Gregor >> >> >> >> >> >> On 10/26/10 14:04 , Martin Holste wrote: >>> That's what I originally thought. What was throwing me was when I >>> would try to find packets any older than the cutoff, the queries would >>> come up empty, the log showing something like "query not found in >>> connection table." So I ran "show conn sample" to see the connections >>> table, and the oldest connections were always at the cutoff. When I >>> looked through the source code, it appeared that connections older >>> than the cutoff were evicted from the connections table, but the query >>> depended on the connections table to find the packets on disk/ram. >>> >>> On Tue, Oct 26, 2010 at 2:43 PM, Vern Paxson wrote: >>>>> I don't think you need conn_timeout set that high. >>>> >>>> Right. conn_timeout is how long to keep internal state when a connection >>>> is inactive; *not* how long to keep recorded connections lying around. >>>> >>>> Vern >>>> >>> >>> _______________________________________________ >>> Bro mailing list >>> bro at bro-ids.org >>> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro >> >> >> -- >> Gregor Maier gregor at icir.org >> Int. Computer Science Institute (ICSI) gregor at icsi.berkeley.edu >> 1947 Center St., Ste. 600 http://www.icir.org/gregor/ >> Berkeley, CA 94704 >> USA >> > -- Gregor Maier gregor at icir.org Int. Computer Science Institute (ICSI) gregor at icsi.berkeley.edu 1947 Center St., Ste. 600 http://www.icir.org/gregor/ Berkeley, CA 94704 USA From seth at icir.org Thu Oct 28 08:56:29 2010 From: seth at icir.org (Seth Hall) Date: Thu, 28 Oct 2010 11:56:29 -0400 Subject: [Bro] Bro scripts Message-ID: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> Hi all, I'm doing work on Bro's policy scripts for the next release and I want to find policy scripts floating around that can be shared and any helpful code snippets. Anything you can contribute would be greatly appreciated, thanks! .Seth From mcholste at gmail.com Thu Oct 28 14:13:00 2010 From: mcholste at gmail.com (Martin Holste) Date: Thu, 28 Oct 2010 16:13:00 -0500 Subject: [Bro] time machine filesize issue Message-ID: I wanted to make my disk-bound queries faster, so I wanted the fewest files to search through for tm because it appears that every separate file makes the interval searches in pcapnav slower if you're requesting many packets. I found than when setting filesize > 289g, tm creates a file per connection and trashes its working directory. So two questions: am I right in thinking it is faster to search through as few files as possible when using pcapnav? And secondly, does anyone know why tm breaks when trying to create files larger than 289g? Thanks, Martin From gregor at icir.org Thu Oct 28 14:42:05 2010 From: gregor at icir.org (Gregor Maier) Date: Thu, 28 Oct 2010 14:42:05 -0700 Subject: [Bro] time machine filesize issue In-Reply-To: References: Message-ID: <4CC9EE2D.8020703@icir.org> On 10/28/10 14:13 , Martin Holste wrote: > I wanted to make my disk-bound queries faster, so I wanted the fewest > files to search through for tm because it appears that every separate > file makes the interval searches in pcapnav slower if you're > requesting many packets. I found than when setting filesize > 289g, > tm creates a file per connection and trashes its working directory. > So two questions: am I right in thinking it is faster to search > through as few files as possible when using pcapnav? And secondly, > does anyone know why tm breaks when trying to create files larger than > 289g? I'm don't think that pcapnav speed is significantly influenced by filesize. AFAIK pcapnav jumps to a random file offset, then tries to sequentially read until it finds something that looks like a pcap header. Then it checks the timestamp and reads sequentially or jumps somewhere else until it finds the request timestamp. If you have multiple files, then this is repeated for each file. However, the TM knows which files cover which time periods, so it will only access the files that it knows are candidates. So I would assume that the lookup speed should be similar. I think that the specifics of the query-result influence speed much more (e.g., is it only a single, narrow time interval to search, or multiple small ones, or a few large ones that cover almost the whole dataset). Long story short: the number of files to search should not influence the speed much. If the number of files is huge, then the only thing I could imagine is weird filesystem stuff going on when there are 1000s of files in one directory and..... OTOH, if the filesize is too large wrt the configured diskspace, the TM will get troubles. It will delete old files, if writing more data (or creating a new data file, can't recall which of the two). So if the data files are huge, this will introduce quite some variance in diskspace usage. That said: the TM definitely should not trash its working directory..... Do I understand you correctly that you get a myriad of files in the working directory. Do the files contain only a single (or handful) of packets (possible from different connections). How many packets per file? Also, how does your filesize relate to the configured disk-space? cu Gregor -- Gregor Maier gregor at icir.org Int. Computer Science Institute (ICSI) gregor at icsi.berkeley.edu 1947 Center St., Ste. 600 http://www.icir.org/gregor/ Berkeley, CA 94704 USA From mcholste at gmail.com Thu Oct 28 15:40:06 2010 From: mcholste at gmail.com (Martin Holste) Date: Thu, 28 Oct 2010 17:40:06 -0500 Subject: [Bro] time machine filesize issue In-Reply-To: <4CC9EE2D.8020703@icir.org> References: <4CC9EE2D.8020703@icir.org> Message-ID: My performance issues were noticed when making a query over a large timeset with many packets involved. Since there is no way to specify a limit of packets returned, the query takes forever. I was looking to improve that performance. I will continue to play around with this to see if there is any improvement worth the large hit for file rollover. With filesize set at exactly 280g (279g does not produce the problem) tm will create one disk fifo file per packet in the workdir for each evicted packet with a disk setting of 1000g. I am only using one default class for "all." On Thu, Oct 28, 2010 at 4:42 PM, Gregor Maier wrote: > On 10/28/10 14:13 , Martin Holste wrote: >> I wanted to make my disk-bound queries faster, so I wanted the fewest >> files to search through for tm because it appears that every separate >> file makes the interval searches in pcapnav slower if you're >> requesting many packets. ?I found than when setting filesize > 289g, >> tm creates a file per connection and trashes its working directory. >> So two questions: am I right in thinking it is faster to search >> through as few files as possible when using pcapnav? ?And secondly, >> does anyone know why tm breaks when trying to create files larger than >> 289g? > > I'm don't think that pcapnav speed is significantly influenced by > filesize. AFAIK pcapnav jumps to a random file offset, then tries to > sequentially read until it finds something that looks like a pcap > header. Then it checks the timestamp and reads sequentially or jumps > somewhere else until it finds the request timestamp. > If you have multiple files, then this is repeated for each file. > However, the TM knows which files cover which time periods, so it will > only access the files that it knows are candidates. So I would assume > that the lookup speed should be similar. I think that the specifics of > the query-result influence speed much more (e.g., is it only a single, > narrow time interval to search, or multiple small ones, or a few large > ones that cover almost the whole dataset). > Long story short: the number of files to search should not influence the > speed much. > If the number of files is huge, then the only thing I could imagine is > weird filesystem stuff going on when there are 1000s of files in one > directory and..... > > OTOH, if the filesize is too large wrt the configured diskspace, the TM > will get troubles. It will delete old files, if writing more data (or > creating a new data file, can't recall which of the two). So if the data > files are huge, this will introduce quite some variance in diskspace usage. > > That said: the TM definitely should not trash its working directory..... > Do I understand you correctly that you get a myriad of files in the > working directory. Do the files contain only a single (or handful) of > packets (possible from different connections). How many packets per file? > Also, how does your filesize relate to the configured disk-space? > > > cu > Gregor > -- > Gregor Maier ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? gregor at icir.org > Int. Computer Science Institute (ICSI) ? ? ? ? ?gregor at icsi.berkeley.edu > 1947 Center St., Ste. 600 ? ? ? ? ? ? ? ? ? ?http://www.icir.org/gregor/ > Berkeley, CA 94704 > USA > From vallentin at icir.org Thu Oct 28 17:59:36 2010 From: vallentin at icir.org (Matthias Vallentin) Date: Thu, 28 Oct 2010 17:59:36 -0700 Subject: [Bro] Bro scripts In-Reply-To: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> Message-ID: <20101029005936.GJ16825@icsi.berkeley.edu> > I'm doing work on Bro's policy scripts for the next release and I want > to find policy scripts floating around that can be shared and any > helpful code snippets. Anything you can contribute would be greatly > appreciated, thanks! The whole buzz about Firesheep caused me to hack up a sidejacking detector. I haven't tested it because I literally wrote it 5 minutes ago. Matthias Here is the code: @load http-request @load http-reply module HTTP; export { redef enum Notice += { CookieReuse }; # Number of cookies per client. const max_cookies = 1 &redef; # The time after when we expiring entries. const cookie_expiration = 1 hr &redef; } # Count the number of cookies per client. global cookies: table[string] of set[addr] &write_expire = cookie_expiration; event http_header(c: connection, is_orig: bool, name: string, value: string) { # We are only looking for session IDs in the client cookie header. if (! (is_orig && name == /[cC][oO][oO][kK][iI][eE]/)) return; local client = c$id$orig_h; if (value !in cookies) cookies[value] = set(); else add cookies[value][client]; if (|cookies[value]| <= max_cookies) return; local s = lookup_http_request_stream(c); NOTICE([$note=CookieReuse, $src=client, $msg=fmt("potential sidejacking by %s: cookie used by %d addresses", client, |cookies[value]|)]); } From mcholste at gmail.com Thu Oct 28 18:48:30 2010 From: mcholste at gmail.com (Martin Holste) Date: Thu, 28 Oct 2010 20:48:30 -0500 Subject: [Bro] Bro scripts In-Reply-To: <20101029005936.GJ16825@icsi.berkeley.edu> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> Message-ID: That's pretty cool! I do have one suggestion, though: Instead of tracking by IP, how about one cookie per user agent? That will help catch the side jacking when used under a NAT. On Thursday, October 28, 2010, Matthias Vallentin wrote: >> I'm doing work on Bro's policy scripts for the next release and I want >> to find policy scripts floating around that can be shared and any >> helpful code snippets. ?Anything you can contribute would be greatly >> appreciated, thanks! > > The whole buzz about Firesheep caused me to hack up a sidejacking > detector. I haven't tested it because I literally wrote it 5 minutes > ago. > > ? Matthias > > Here is the code: > > ? ?@load http-request > ? ?@load http-reply > > ? ?module HTTP; > > ? ?export > ? ?{ > ? ? ? ?redef enum Notice += { CookieReuse }; > > ? ? ? ?# Number of cookies per client. > ? ? ? ?const max_cookies = 1 &redef; > > ? ? ? ?# The time after when we expiring entries. > ? ? ? ?const cookie_expiration = 1 hr &redef; > ? ?} > > > ? ?# Count the number of cookies per client. > ? ?global cookies: table[string] of set[addr] &write_expire = cookie_expiration; > > ? ?event http_header(c: connection, is_orig: bool, name: string, value: string) > ? ?{ > ? ? ? ?# We are only looking for session IDs in the client cookie header. > ? ? ? ?if (! (is_orig && name == /[cC][oO][oO][kK][iI][eE]/)) > ? ? ? ? ? ?return; > > ? ? ? ?local client = c$id$orig_h; > ? ? ? ?if (value !in cookies) > ? ? ? ? ? ?cookies[value] = set(); > ? ? ? ?else > ? ? ? ? ? ?add cookies[value][client]; > > ? ? ? ?if (|cookies[value]| <= max_cookies) > ? ? ? ? ? ?return; > > ? ? ? ?local s = lookup_http_request_stream(c); > ? ? ? ?NOTICE([$note=CookieReuse, $src=client, > ? ? ? ? ? ? ? ?$msg=fmt("potential sidejacking by %s: cookie used by %d addresses", > ? ? ? ? ? ? ? ?client, |cookies[value]|)]); > ? ?} > _______________________________________________ > Bro mailing list > bro at bro-ids.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro > From seth at icir.org Thu Oct 28 20:23:03 2010 From: seth at icir.org (Seth Hall) Date: Thu, 28 Oct 2010 23:23:03 -0400 Subject: [Bro] Bro scripts In-Reply-To: References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> Message-ID: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> On Oct 28, 2010, at 9:48 PM, Martin Holste wrote: > That's pretty cool! I do have one suggestion, though: Instead of > tracking by IP, how about one cookie per user agent? That will help > catch the side jacking when used under a NAT. Good point! Changing the tracking global from... global cookies: table[string] of set[addr] to... global cookies: table[string] of set[addr, string] and then storing the user-agent in the last string would take care of that. I think your point about NAT gets to a more general point of what techniques could we use to detect NAT? I know that there are a lot of little indicators of addresses that are doing NAT, but I think it could be really worthwhile to organize them all and then write a script to implement all of them so that we can get reliable NAT detection with Bro. I can start with a few thoughts. * Multiple web browser user-agents at a single address - Must match some regex for a "real" browser so that weird applications throwing junk in the user-agent don't trigger this. - Must be closely together in time. Over the past several years I've had a lot of ideas for detecting NATs, but they have all completely escaped me. Anyone else have thoughts to add to this? .Seth From mcholste at gmail.com Thu Oct 28 21:50:26 2010 From: mcholste at gmail.com (Martin Holste) Date: Thu, 28 Oct 2010 23:50:26 -0500 Subject: [Bro] Bro scripts In-Reply-To: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> Message-ID: I think that will definitely work for detecting NAT's if you stick to regexing the variants on the major browsers. As we've all seen, most browser plugins have their own UA, so you're bound to get many UA's out of a single computer naturally, but they should not all be for Internet Explorer, for example. I think scoping to IE, FF, and Webkit engines would be good enough to be effective. One other point, once a NAT is detected, would it be possible to exclude that IP from future detection to save resources? I'm a bit concerned with memory utilization for all of these state tables. On Thu, Oct 28, 2010 at 10:23 PM, Seth Hall wrote: > > On Oct 28, 2010, at 9:48 PM, Martin Holste wrote: > >> That's pretty cool! ?I do have one suggestion, though: ?Instead of >> tracking by IP, how about one cookie per user agent? ?That will help >> catch the side jacking when used under a NAT. > > Good point! ?Changing the tracking global from... > > global cookies: table[string] of set[addr] > to... > global cookies: table[string] of set[addr, string] > > and then storing the user-agent in the last string would take care of that. > > I think your point about NAT gets to a more general point of what techniques could we use to detect NAT? ?I know that there are a lot of little indicators of addresses that are doing NAT, but I think it could be really worthwhile to organize them all and then write a script to implement all of them so that we can get reliable NAT detection with Bro. ?I can start with a few thoughts. > > * Multiple web browser user-agents at a single address > ? ?- Must match some regex for a "real" browser so that weird applications throwing junk in the user-agent don't trigger this. > ? ?- Must be closely together in time. > > Over the past several years I've had a lot of ideas for detecting NATs, but they have all completely escaped me. ?Anyone else have thoughts to add to this? > > ?.Seth From vallentin at icir.org Thu Oct 28 23:56:15 2010 From: vallentin at icir.org (Matthias Vallentin) Date: Thu, 28 Oct 2010 23:56:15 -0700 Subject: [Bro] Bro scripts In-Reply-To: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> Message-ID: <20101029065615.GK16825@icsi.berkeley.edu> > On Oct 28, 2010, at 9:48 PM, Martin Holste wrote: > > Instead of tracking by IP, how about one cookie per user agent? > > Good point! Indeed. > global cookies: table[string] of set[addr] > to... > global cookies: table[string] of set[addr, string] That will almost do it, except that I now need to write a handler for http_all_headers instead of http_header to obviate the need for some global glue code. Furthermore, the Cookie header often bundles a bunch of cookie key-value pairs of which only a few define the actual user session. The others can vary and thus cause false negatives. Firesheep fortunately ships with a bunch of handlers for major sites which I will use a baseline to define user session for specific sites, i.e., # Distills relevant cookies that define a user session. type user_session: record { url: pattern; # URL cookies: pattern; # Cookie keys that define the user session. }; const session_info: table[string] of user_session = { ["Amazon"] = [$url=/amazon.com/, $cookies=/x-main/], ["Dropbox"] = [$url=/dropbox.com/, $cookies=/lid/], ["Facebook"] = [$url=/facebook.com/, $cookies=/xs|c_user|sid/], ["Flickr"] = [$url=/flickr.com/, $cookies=/cookie_session/], ["Google"] = [$url=/google.com/, $cookies=/NID|SID|HSID|PREF/], ["NY Times"] = [$url=/nytimes.com/, $cookies=/NYT-s|nyt-d/], ["Twitter"] = [$url=/twitter.com/, $cookies=/_twitter_sess/], ["Yelp"] = [$url=/yelp.com/, $cookies=/__utma/], ["Windows Live"] = [$url=/live.com/, $cookies=/MSP(Prof|Auth)|RPSTAuth|NAP/], ["Wordpress"] = [$url=/yelp.com/, $cookies=/wordpress_[0-9a-fA-F]+/] } &redef; What remains todo is to split the Cookie string into its key-value pairs and then match the keys against user_session$cookies. Instead of regular expression, I'd preferably have a set[string], but this cannot be statically defined in a record, i.e., ["Facebook"] = [$url=/facebook.com/, $cookies={"xs", "c_user", "sid"}], ^^^^^^^^^^^^^^^^^^^^^^^ appears not to be correct Bro syntax, because I think variable-size types inside records cannot be initialized statically. Is that correct? If so, I'd probably change to simple table[string] of set[string] to represent user sessions. In any case, the downside is that this would only detect sidejacking for known sites. I think it would make sense to do the following. If a profile for a user_session for a particular site (as defined above) exists, use it, and otherwise use the entire cookie value. > I think your point about NAT gets to a more general point of what > techniques could we use to detect NAT? This is truly an important issue to tackle. I wonder if it is possible to have better abstractions in Bro to support user-based analysis. For example, it would be neat to augment several events with a "user" argument which is a essentially a record filled by many other events. In HTTP for example, some code would parse the User-Agent and fill this record, so that the script writer could simply refer to user$os or user$browser. Matthias From JAzoff at uamail.albany.edu Fri Oct 29 06:12:40 2010 From: JAzoff at uamail.albany.edu (Justin Azoff) Date: Fri, 29 Oct 2010 09:12:40 -0400 Subject: [Bro] Bro scripts In-Reply-To: <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> Message-ID: <20101029131240.GD6560@datacomm.albany.edu> On Thu, Oct 28, 2010 at 11:23:03PM -0400, Seth Hall wrote: > I think your point about NAT gets to a more general point of what > techniques could we use to detect NAT? Using user-agents for this is tricky. I've written some code to analyze the output of your http-user-agents.log in splunk, and found that the best thing to look at is the architecture and os, and ignore the browser itself. the script I use is here: http://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py it's for use in splunk, but it's 90% regexes, stuff like this: os_mapping = ( ('Windows .. 5.1', 'Windows XP'), ('Windows .. 5.2', 'Windows XP'), ('Windows NT 6.0', 'Windows Vista'), ('Windows 6.0', 'Windows Server 2008'), ('Windows NT 6.1', 'Windows 7'), ('OS X 10.5', 'MAC OS X 10.5.x'), ('Darwin', 'MAC OS X other'), ... ('Android', 'Android'), ('Linux ', 'Linux'), ('Windows', 'Windows - Other'), ('iPad', 'ipad'), ('iPod', 'ipod'), ('iPhone', 'iphone'), ) arch_mapping = ( ('Windows .. 5.2', 'x64'), ('x64', 'x64'), ... ('iPad', 'ipad'), ('iPod', 'ipod'), ('iPhone', 'iphone'), ('Intel', 'Intel'), ) It is not uncommon to have one machine using multiple browsers, but rare for it to indentify as both Vista and Windows 7, or both i386 and x64, or Windows XP and Mac OS X 10.5. Also, some user-agents can immediately identify NAT: iOS and android devices do not have ethernet interfaces, so if one of these devices is found on a non-wireless subnet it indicates the presense of a rogue access point. -- -- Justin Azoff -- Network Security & Performance Analyst From mcholste at gmail.com Fri Oct 29 06:53:03 2010 From: mcholste at gmail.com (Martin Holste) Date: Fri, 29 Oct 2010 08:53:03 -0500 Subject: [Bro] Bro scripts In-Reply-To: <20101029131240.GD6560@datacomm.albany.edu> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> <20101029131240.GD6560@datacomm.albany.edu> Message-ID: Thanks for sharing that. Obviously in a corporate environment (or any in which desktops are managed) most user agents will appear the same because they are all running the same browser version. However, I have seen that for guest wireless and other public access points, the amount of plugins, .NET versions, etc. makes the UA's fairly unique, so off the bat your mileage will vary depending on the client class. Using the detected OS would certainly be more accurate, but the chances of an attacker having the same OS as the victim are pretty good, so you'll obviously have to deal with a lot of false negatives. Maybe concatenating the p0f signature with the user agent is the best way to get a pseudo machine ID. On Fri, Oct 29, 2010 at 8:12 AM, Justin Azoff wrote: > On Thu, Oct 28, 2010 at 11:23:03PM -0400, Seth Hall wrote: >> I think your point about NAT gets to a more general point of what >> techniques could we use to detect NAT? > > Using user-agents for this is tricky. ?I've written some code to analyze > the output of your http-user-agents.log in splunk, and found that the > best thing to look at is the architecture and os, and ignore the > browser itself. > > the script I use is here: > > http://github.com/JustinAzoff/splunk-scripts/blob/master/ua2os.py > > it's for use in splunk, but it's 90% regexes, stuff like this: > > os_mapping = ( > ? ?('Windows .. 5.1', 'Windows XP'), > ? ?('Windows .. 5.2', 'Windows XP'), > ? ?('Windows NT 6.0', 'Windows Vista'), > ? ?('Windows 6.0', 'Windows Server 2008'), > ? ?('Windows NT 6.1', 'Windows 7'), > ? ?('OS X 10.5', 'MAC OS X 10.5.x'), > ? ?('Darwin', 'MAC OS X other'), > ? ?... > ? ?('Android', 'Android'), > ? ?('Linux ', 'Linux'), > ? ?('Windows', 'Windows - Other'), > ? ?('iPad', 'ipad'), > ? ?('iPod', 'ipod'), > ? ?('iPhone', 'iphone'), > ) > > arch_mapping = ( > ? ?('Windows .. 5.2', 'x64'), > ? ?('x64', 'x64'), > ? ?... > ? ?('iPad', 'ipad'), > ? ?('iPod', 'ipod'), > ? ?('iPhone', 'iphone'), > ? ?('Intel', 'Intel'), > ) > > It is not uncommon to have one machine using multiple browsers, but rare > for it to indentify as both Vista and Windows 7, or both i386 and x64, or > Windows XP and Mac OS X 10.5. > > Also, some user-agents can immediately identify NAT: iOS and android > devices do not have ethernet interfaces, so if one of these devices is > found on a non-wireless subnet it indicates the presense of a rogue access > point. > > -- > -- Justin Azoff > -- Network Security & Performance Analyst > From vallentin at icir.org Fri Oct 29 16:35:14 2010 From: vallentin at icir.org (Matthias Vallentin) Date: Fri, 29 Oct 2010 16:35:14 -0700 Subject: [Bro] Bro scripts In-Reply-To: <20101029065615.GK16825@icsi.berkeley.edu> References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> <966EF229-5A1D-4A2B-BE79-F8DAD64B855B@icir.org> <20101029065615.GK16825@icsi.berkeley.edu> Message-ID: <20101029233514.GN2349@icsi.berkeley.edu> > I think it would make sense to do the following. If a profile for a > user_session for a particular site (as defined above) exists, use it, > and otherwise use the entire cookie value. Attached is the full version of the sidejacking detector that includes all the Firesheep handlers. I tested it for Twitter, Amazon, and Google. The script successfully reports alarms when I hijack my own connections with Firesheep. Matthias -------------- next part -------------- # A simple sidejacking detector. # # The script raises an alarm whenever more than one client makes use of the # same cookie, where a user is defined a as (IP, user agent) pair. @load notice @load http-request @load http-reply module HTTP; export { redef enum Notice += { Sidejacking }; # The time after expiring entries. This allows to roam users and later # reconnect from a different address without triggering a false positive. const cookie_expiration = 1 hr &redef; type cookie_info: record { url: pattern; # URL pattern matched against Host header. pat: pattern; # Cookie keys that define the user session. }; # List of cookie information per service (taken from Firesheep handlers). const cookie_list: table[string] of cookie_info = { ["Amazon"] = [$url=/amazon.com/, $pat=/x-main/], ["Basecamp"] = [$url=/basecamphq.com/, $pat=/_basecamp_session|session_token/], ["bit.ly"] = [$url=/bit.ly/, $pat=/user/], ["Cisco"] = [$url=/cisco.com/, $pat=/SMIDENTITY/], ["CNET"] = [$url=/cnet.com/, $pat=/urs_sessionId/], ["Dropbox"] = [$url=/dropbox.com/, $pat=/lid/], ["Enom"] = [$url=/enom.com/, $pat=/OatmealCookie|EmailAddress/], ["Evernote"] = [$url=/evernote.com/, $pat=/auth/], ["Facebook"] = [$url=/facebook.com/, $pat=/xs|c_user|sid/], ["Flickr"] = [$url=/flickr.com/, $pat=/cookie_session/], ["Foursquare"] = [$url=/foursquare.com/, $pat=/ext_id|XSESSIONID/], ["GitHub"] = [$url=/github.com/, $pat=/_github_ses/], ["Google"] = [$url=/google.com/, $pat=/NID|SID|HSID|PREF/], ["Gowalla"] = [$url=/gowalla.com/, $pat=/__utma/], ["Hacker News"] = [$url=/news.ycombinator.com/, $pat=/user/], ["Harvest"] = [$url=/harvestapp.com/, $pat=/_enc_sess/], ["NY Times"] = [$url=/nytimes.com/, $pat=/NYT-s|nyt-d/], ["Pivotal Tracker"] = [$url=/pivotaltracker.com/, $pat=/_myapp_session/], ["Slicehost"] = [$url=/manage.slicehost.com/, $pat=/_coach_session_id/], ["tumblr"] = [$url=/tumblr.com/, $pat=/pfp/], ["Twitter"] = [$url=/twitter.com/, $pat=/_twitter_sess/], ["Yahoo"] = [$url=/yahoo.com/, $pat=/T|Y/], ["Yelp"] = [$url=/yelp.com/, $pat=/__utma/], ["Windows Live"] = [$url=/live.com/, $pat=/MSP(Prof|Auth)|RPSTAuth|NAP/], ["Wordpress"] = [$url=/wordpress.com/, $pat=/wordpress_[0-9a-fA-F]+/] } &redef; } # Map cookies to users, who are defined as a (address, user-agent) pair. global cookies: table[string] of set[addr,string] &write_expire = cookie_expiration; # Create a unique user session identifier based on a pattern of cookie keys. function sessionize(cookie: string, keys: pattern) : string { local id = ""; local fields = split(cookie, /; /); for (i in fields) { local s = split1(fields[i], /=/); if (keys in s[1]) id += s[2]; } return id; } event http_all_headers(c: connection, is_orig: bool, hlist: mime_header_list) { if (! is_orig) return; local cookie = ""; local ua = ""; local host = ""; for (i in hlist) { local hdr = hlist[i]$name; local value = hlist[i]$value; if (hdr == "COOKIE") cookie = value; else if (hdr == "USER-AGENT") ua = value; else if (hdr == "HOST") host = to_lower(value); } if (cookie == "") return; # Restrict ourselves to a subset of cookie keys that define a user session. local id = ""; local desc = ""; if (host != "") for (k in cookie_list) { local info = cookie_list[k]; if (info$url in host) { id = sessionize(cookie, info$pat); desc = k; break; } } if (id == "") id = cookie; if (id !in cookies) cookies[id] = set() &mergeable; local client = c$id$orig_h; add cookies[id][client, ua]; if (|cookies[id]| <= 1) return; local s = lookup_http_request_stream(c); desc = (desc == "" ? "" : fmt("%s ", desc)); NOTICE([$note=Sidejacking, $src=client, $msg=fmt("%ssession hijacked by %s (%d users/cookie)", desc, client, |cookies[id]|)]); } From vern at icir.org Sat Oct 30 13:52:00 2010 From: vern at icir.org (Vern Paxson) Date: Sat, 30 Oct 2010 13:52:00 -0700 Subject: [Bro] set initializers (Re: Bro scripts) In-Reply-To: <20101029065615.GK16825@icsi.berkeley.edu> (Thu, 28 Oct 2010 23:56:15 PDT). Message-ID: <20101030205200.CE9D736A4F2@taffy.ICSI.Berkeley.EDU> > expression, I'd preferably have a set[string], but this cannot be > statically defined in a record, i.e., > > ["Facebook"] = [$url=/facebook.com/, $cookies={"xs", "c_user", "sid"}], > ^^^^^^^^^^^^^^^^^^^^^^^ > appears not to be correct Bro syntax, because I think variable-size > types inside records cannot be initialized statically. Is that correct? You can construct sets using .... $cookies=set("xs", "c_user", "sid") for example. Vern From vern at icir.org Sat Oct 30 13:52:03 2010 From: vern at icir.org (Vern Paxson) Date: Sat, 30 Oct 2010 13:52:03 -0700 Subject: [Bro] time machine filesize issue In-Reply-To: (Thu, 28 Oct 2010 17:40:06 CDT). Message-ID: <20101030205203.A6FAC36A4F2@taffy.ICSI.Berkeley.EDU> > With filesize set at exactly 280g (279g does not produce the problem) > tm will create one disk fifo file per packet in the workdir for each > evicted packet with a disk setting of 1000g. I am only using one > default class for "all." That sounds like something is wrapping and going negative at the 2^38 barrier. Vern From gregor at icir.org Sun Oct 31 15:38:46 2010 From: gregor at icir.org (Gregor Maier) Date: Sun, 31 Oct 2010 15:38:46 -0700 Subject: [Bro] NAT detection (was: Re: Bro scripts) In-Reply-To: References: <1FCB4D90-DD0A-4528-8045-13C06B4CA385@icir.org> <20101029005936.GJ16825@icsi.berkeley.edu> Message-ID: <4CCDEFF6.7030002@icir.org> Hi, I've played around with NAT detection based on user-agent strings and IP TTL. See http://www.icir.org/gregor/papers/gregor-phd.pdf, Chapter 4 cu gregor On 10/28/10 18:48 , Martin Holste wrote: > That's pretty cool! I do have one suggestion, though: Instead of > tracking by IP, how about one cookie per user agent? That will help > catch the side jacking when used under a NAT. > > On Thursday, October 28, 2010, Matthias Vallentin wrote: >>> I'm doing work on Bro's policy scripts for the next release and I want >>> to find policy scripts floating around that can be shared and any >>> helpful code snippets. Anything you can contribute would be greatly >>> appreciated, thanks! >> >> The whole buzz about Firesheep caused me to hack up a sidejacking >> detector. I haven't tested it because I literally wrote it 5 minutes >> ago. >> >> Matthias >> >> Here is the code: >> >> @load http-request >> @load http-reply >> >> module HTTP; >> >> export >> { >> redef enum Notice += { CookieReuse }; >> >> # Number of cookies per client. >> const max_cookies = 1 &redef; >> >> # The time after when we expiring entries. >> const cookie_expiration = 1 hr &redef; >> } >> >> >> # Count the number of cookies per client. >> global cookies: table[string] of set[addr] &write_expire = cookie_expiration; >> >> event http_header(c: connection, is_orig: bool, name: string, value: string) >> { >> # We are only looking for session IDs in the client cookie header. >> if (! (is_orig && name == /[cC][oO][oO][kK][iI][eE]/)) >> return; >> >> local client = c$id$orig_h; >> if (value !in cookies) >> cookies[value] = set(); >> else >> add cookies[value][client]; >> >> if (|cookies[value]| <= max_cookies) >> return; >> >> local s = lookup_http_request_stream(c); >> NOTICE([$note=CookieReuse, $src=client, >> $msg=fmt("potential sidejacking by %s: cookie used by %d addresses", >> client, |cookies[value]|)]); >> } >> _______________________________________________ >> Bro mailing list >> bro at bro-ids.org >> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro >> > > _______________________________________________ > Bro mailing list > bro at bro-ids.org > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/bro -- Gregor Maier gregor at icir.org Int. Computer Science Institute (ICSI) gregor at icsi.berkeley.edu 1947 Center St., Ste. 600 http://www.icir.org/gregor/ Berkeley, CA 94704 USA