From jsiwek at corelight.com Tue Jan 2 08:58:15 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 2 Jan 2018 10:58:15 -0600 Subject: [Bro-Dev] SMB transaction messages pull request In-Reply-To: References: Message-ID: On Fri, Dec 29, 2017 at 2:19 AM, Bencteux Jeffrey wrote: > I made a pull request a while ago to add/update messages for the SMB > analyzer and I did not get no feedback. Is there something wrong with > it? I'd be happy to modify it to fit your requirements if necessary. > > > You can find it here : https://github.com/bro/bro/pull/119. Thanks for the PR, likely just no one had time to look yet. I'll do that shortly. - Jon From jmellander at lbl.gov Fri Jan 5 12:19:26 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Fri, 5 Jan 2018 12:19:26 -0800 Subject: [Bro-Dev] 'for loop' variable modification Message-ID: Hi all: I'm working on an interesting Bro policy, and I want to be able to begin a 'for loop' at some point where it previously left off. Pseudo-code: for (foo in bar) { if (foo == "baz") break; . process bar[foo] . } . . Do other work (not changing bar) . . first_time = T; for (foo in bar) { if (first_time) { first_time = F; foo = "baz"; } . process bar[foo] . } .... If the loop variable can be reassigned in the loop, and the loop continued from that point, it would facilitate some of the processing I'm doing. The above synthetic code could be refactored to avoid the issue, but.... My real-world issue is that I have a large table to process, and want to amortize the processing of it on the time domain: A. Process first N items in table B. Schedule processing of next N items via an event C. When event triggers, pick up where we left off, and process next N items, etc. (There are inefficient ways of processing these that solve some, but not all issues, such as putting the indices in a vector, then going thru that - wont go into the problems with that right now) I haven't checked whether my desired behavior works, but since its not documented, I wouldn't want to rely on it in any event. I would be interested in hearing comments or suggestions on this issue. Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180105/b32aa19b/attachment.html From jsiwek at corelight.com Fri Jan 5 17:28:27 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 5 Jan 2018 19:28:27 -0600 Subject: [Bro-Dev] 'for loop' variable modification In-Reply-To: References: Message-ID: On Fri, Jan 5, 2018 at 2:19 PM, Jim Mellander wrote: > I haven't checked whether my desired behavior works, but since its not > documented, I wouldn't want to rely on it in any event. Yeah, I doubt the example you gave currently works -- it would just change the local value in the frame without modifying the internal iterator. > I would be interested in hearing comments or suggestions on this issue. What you want, the ability to split the processing of large data tables/sets over time, makes sense. I've probably also run into at least a couple cases where I've been concerned about how long it would take to iterate over a set/table and process all keys in one go. The approach that comes to mind for doing that would be adding coroutines. Robin has some ongoing work with adding better support for async function calls, and I wonder if the way that's done would make it pretty simple to add general coroutine support as well. E.g. stuff could look like: event process_stuff() { local num_processed = 0; for ( local item in foo ) { process_item(item); if ( ++num_processed % 1000 == 0 ) yield; # resume next time events get drained (e.g. next packet) } There could also be other types of yield instructions, like "yield 1 second" or "yield wait_for_my_signal()" which would, respectively, resume after arbitrary amount of time or a custom function says it should. - Jon From jmellander at lbl.gov Fri Jan 5 18:04:03 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Fri, 5 Jan 2018 18:04:03 -0800 Subject: [Bro-Dev] 'for loop' variable modification In-Reply-To: References: Message-ID: Thanks, Jon: I've decided to split the data (a table of IP addresses with statistics captured over a time period) based on a modulo calculation against the IP address (the important characteristic being that it can be done on the fly without an additional pass thru the table), which with an average distribution of traffic gives relatively equal size buckets, each of which can be processed during a single event, as I described. I like the idea of co-routines - it would help to address issues like these in a more natural manner. Jim On Fri, Jan 5, 2018 at 5:28 PM, Jon Siwek wrote: > On Fri, Jan 5, 2018 at 2:19 PM, Jim Mellander wrote: > > > I haven't checked whether my desired behavior works, but since its not > > documented, I wouldn't want to rely on it in any event. > > Yeah, I doubt the example you gave currently works -- it would just > change the local value in the frame without modifying the internal > iterator. > > > I would be interested in hearing comments or suggestions on this issue. > > What you want, the ability to split the processing of large data > tables/sets over time, makes sense. I've probably also run into at > least a couple cases where I've been concerned about how long it would > take to iterate over a set/table and process all keys in one go. The > approach that comes to mind for doing that would be adding coroutines. > Robin has some ongoing work with adding better support for async > function calls, and I wonder if the way that's done would make it > pretty simple to add general coroutine support as well. E.g. stuff > could look like: > > event process_stuff() > { > local num_processed = 0; > > for ( local item in foo ) > { > process_item(item); > > if ( ++num_processed % 1000 == 0 ) > yield; # resume next time events get drained (e.g. next > packet) > } > > There could also be other types of yield instructions, like "yield 1 > second" or "yield wait_for_my_signal()" which would, respectively, > resume after arbitrary amount of time or a custom function says it > should. > > - Jon > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180105/28039ce5/attachment.html From seth at corelight.com Sun Jan 7 12:55:41 2018 From: seth at corelight.com (Seth Hall) Date: Sun, 7 Jan 2018 15:55:41 -0500 Subject: [Bro-Dev] SMB transaction messages pull request In-Reply-To: References: Message-ID: <8F0C3916-B33A-4633-825B-EB57346708FA@corelight.com> Thanks Jon! I do apologize Jeffrey, the pull request was my responsibility and I've been meaning to get to it. .Seth -- Seth Hall * Corelight, Inc * www.corelight.com > On Jan 2, 2018, at 11:58 AM, Jon Siwek wrote: > > On Fri, Dec 29, 2017 at 2:19 AM, Bencteux Jeffrey > wrote: >> I made a pull request a while ago to add/update messages for the SMB >> analyzer and I did not get no feedback. Is there something wrong with >> it? I'd be happy to modify it to fit your requirements if necessary. >> >> >> You can find it here : https://github.com/bro/bro/pull/119. > > Thanks for the PR, likely just no one had time to look yet. I'll do > that shortly. > > - Jon > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From valerio.click at gmx.com Mon Jan 8 10:25:01 2018 From: valerio.click at gmx.com (Valerio) Date: Mon, 8 Jan 2018 19:25:01 +0100 Subject: [Bro-Dev] [Bro] Best way to contribute to existing analyzer In-Reply-To: <20170613232832.GP10430@icir.org> References: <20170613232832.GP10430@icir.org> Message-ID: <39f5d71b-bae2-371a-93da-9b559b88aa00@gmx.com> Hi, after a few months I finally made to pack my contribution proposal as a pull request available at https://github.com/bro/bro/pull/121 The patch introduces new options types for DHCP protocol and extends dhcp event including new parameters that I believe are useful in network forensics analysis. The options are the following: 55 Parameters Request List; 58 Renewal time; 59 Rebinding time; 61 Client Identifier; 82 Relay Agent Information. while the following are the extended events: dhcp_discover exports client identifier and parameters request list; dhcp_request exports client_identifier and parameters request list; dhcp_ack exports rebinding time, renewal time and list of suboptions value of dhcp relay agent information option; dhcp_inform exports parameters request list. Looking forward to receving feedbacks! best, Valerio Il 14/06/2017 01:28, Robin Sommer ha scritto: > > > On Wed, Jun 14, 2017 at 01:04 +0200, Valerio wrote: > >> What would be the best procedure (and format) to submit such a patch? > > Easiest is to prepare a pull request on GitHub. We have some > guidelines here: > https://www.bro.org/development/contribute.html#submitting-patches > > Looking forward to your patches! > > Robin > From jmellander at lbl.gov Mon Jan 8 22:27:29 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Mon, 8 Jan 2018 22:27:29 -0800 Subject: [Bro-Dev] 'for loop' variable modification In-Reply-To: References: Message-ID: I got the following idea while perusing non_cluster.bro SumStats::process_epoch_result i=1; while (i <= 1000 && |bar| > 0) { for (foo in bar) { break; } ... process bar[foo] ... optional: baz[foo] = bar[foo] #If we need to preserve original data delete bar[foo]; ++i; } This will allow iteration thru the table as I originally desired, although destroying the original table. SumStats::process_epoch_result deletes the current item inside the for loop, so is relying on undefined behavior, per the documentation: "Currently, modifying a container?s membership while iterating over it may result in undefined behavior, so do not add or remove elements inside the loop." The above example avoids that. Does anyone use sumstats outside of a cluster context? On Fri, Jan 5, 2018 at 6:04 PM, Jim Mellander wrote: > Thanks, Jon: > > I've decided to split the data (a table of IP addresses with statistics > captured over a time period) based on a modulo calculation against the IP > address (the important characteristic being that it can be done on the fly > without an additional pass thru the table), which with an average > distribution of traffic gives relatively equal size buckets, each of which > can be processed during a single event, as I described. > > I like the idea of co-routines - it would help to address issues like > these in a more natural manner. > > Jim > > > > > > > > On Fri, Jan 5, 2018 at 5:28 PM, Jon Siwek wrote: > >> On Fri, Jan 5, 2018 at 2:19 PM, Jim Mellander wrote: >> >> > I haven't checked whether my desired behavior works, but since its not >> > documented, I wouldn't want to rely on it in any event. >> >> Yeah, I doubt the example you gave currently works -- it would just >> change the local value in the frame without modifying the internal >> iterator. >> >> > I would be interested in hearing comments or suggestions on this issue. >> >> What you want, the ability to split the processing of large data >> tables/sets over time, makes sense. I've probably also run into at >> least a couple cases where I've been concerned about how long it would >> take to iterate over a set/table and process all keys in one go. The >> approach that comes to mind for doing that would be adding coroutines. >> Robin has some ongoing work with adding better support for async >> function calls, and I wonder if the way that's done would make it >> pretty simple to add general coroutine support as well. E.g. stuff >> could look like: >> >> event process_stuff() >> { >> local num_processed = 0; >> >> for ( local item in foo ) >> { >> process_item(item); >> >> if ( ++num_processed % 1000 == 0 ) >> yield; # resume next time events get drained (e.g. next >> packet) >> } >> >> There could also be other types of yield instructions, like "yield 1 >> second" or "yield wait_for_my_signal()" which would, respectively, >> resume after arbitrary amount of time or a custom function says it >> should. >> >> - Jon >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180108/8b6e99a5/attachment.html From robin at icir.org Tue Jan 9 11:34:17 2018 From: robin at icir.org (Robin Sommer) Date: Tue, 9 Jan 2018 11:34:17 -0800 Subject: [Bro-Dev] 'for loop' variable modification In-Reply-To: References: Message-ID: <20180109193417.GB86826@icir.org> On Fri, Jan 05, 2018 at 19:28 -0600, you wrote: > Robin has some ongoing work with adding better support for async > function calls, and I wonder if the way that's done would make it > pretty simple to add general coroutine support as well. Yes, actually it would, pretty sure we could use that infrastructure for a yield keyword. Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jmellander at lbl.gov Wed Jan 17 16:30:10 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Wed, 17 Jan 2018 16:30:10 -0800 Subject: [Bro-Dev] Sumstats bugs & fixes Message-ID: In a previous email, I asked the question: Does anyone use sumstats outside of a cluster context? In my case, the answer is yes, as I perform development on a laptop, and run bro standalone to test new bro policies. I found several different bugs in share/bro/base/frameworks/sumstats/non-cluster.bro, specifically SumStats::process_epoch_results 1. The policy returns sumstats results 50 at a time, and then reschedules itself for the next 50 after 0.01 seconds.. Unfortunately, the reschedule is: schedule 0.01 secs { process_epoch_result(ss, now, data1) }; instead of: schedule 0.01 secs { *SumStats::*process_epoch_result(ss, now, data1) }; so it silently fails after the first 50 results. Would be nice to have a warning if a script schedules an event that doesn't exist. 2.The serious issue with the policy, though, is that the 'for' loop over the result table is the main loop, with up to 50 items processed and deleted within the loop, the expectation being that the iteration will not thus be disturbed. The attached program (hash_test.bro) demonstrates that this is not the case (should output 1000, 0, but the 2nd value comes back non-zero), in line with the documented caveat: "Currently, modifying a container?s membership while iterating over it may result in undefined behavior, so do not add or remove elements inside the loop." I didn't examine bro source code to appreciate the reason, but surmise that table resizing and rehashing would account for the issue. The consequences of this issue are that, under certain circumstances: * Not all data will be returned by SumStats at the epoch * SumStats::finish_epoch may not be run. To address the issue can be done via a rearrangement of the code, along the lines of the following pseudocode (boundary conditions, etc. ignored) original (modifies table inside 'for' loop): i=50; for (foo in bar) { process bar[foo]; delete bar[foo]; --i; if (i == 0) break; } to (modifies table outside 'for' loop): i=50; while (i >0) { for (foo in bar) { break; } process bar[foo]; delete bar[foo] --i; } ... there are a few other subtleties in the code (keeping a closure on the result table so that sumstats can clear the original table & proceed to the next epoch, and not running SumStats::finish_epoch if the result table was empty to begin with). A bit of rearrangement fixes the bugs while preserving the original behavior, with the help of a wrapper event that checks for an empty result table, and if not makes an explicit copy for further processing by the actual event doing the work. An additional 'for' loop around the result table could be used to keep it all in one event, but looks too much like black magic (and still, albeit probably in a safe way, depending on undefined behavior) - I prefer clear, understandable code (ha!), rather than "dark corner" features. Six months later, when I look at the code, I won't be able to remember the clever trick I was using :-) Attached please find hash_test.bro & (patched) non-cluster.bro Jim Mellander ESNet -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180117/75a9fd40/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: hash_test.bro Type: application/octet-stream Size: 182 bytes Desc: not available Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180117/75a9fd40/attachment.obj -------------- next part -------------- A non-text attachment was scrubbed... Name: non-cluster.bro Type: application/octet-stream Size: 2204 bytes Desc: not available Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180117/75a9fd40/attachment-0001.obj From jsiwek at corelight.com Thu Jan 18 09:50:00 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Thu, 18 Jan 2018 11:50:00 -0600 Subject: [Bro-Dev] Sumstats bugs & fixes In-Reply-To: References: Message-ID: On Wed, Jan 17, 2018 at 6:30 PM, Jim Mellander wrote: > Unfortunately, the reschedule > is: > schedule 0.01 secs { process_epoch_result(ss, now, data1) }; > instead of: > schedule 0.01 secs { SumStats::process_epoch_result(ss, now, data1) }; > so it silently fails after the first 50 results. Thanks, you're right about that. > Would be nice to have a > warning if a script schedules an event that doesn't exist. Right again, it would be nice since it has resulted in bugs like this, though I recall it might not be an easy change to the parser to clear up the differences in namespacing rules for event identifiers. > Attached please find hash_test.bro & (patched) non-cluster.bro Thanks for those. I remember you pointing out the potential problem in the earlier mail and meant to respond to indicate we should fix it and I must have just forgot, so sorry for that. I had a bit of a different idea on how to address the iterator invalidation that might be more understandable: keep a separate list of keys to delete later, outside the loop. I have my version of your proposed fixes at [1]. Can you take a look and let me know if that works for you? - Jon [1] https://github.com/bro/bro/commit/3495b2fa9d84e8105a79e24e4e9a2f9181318f1a From jmellander at lbl.gov Thu Jan 18 10:43:34 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Thu, 18 Jan 2018 10:43:34 -0800 Subject: [Bro-Dev] Sumstats bugs & fixes In-Reply-To: References: Message-ID: Seems like either way would work - I've used the idea of keeping a list of keys to delete after loop exit (not in bro, actually, but in awk, which has a similar paradigm), but since we're peeling off keys, and discarding them after usage, avoiding the additional loop seemed more natural/efficient - although I suppose my way of looping over the keys, just to gather the first item, and breaking out may not seem too natural or be particularly efficient. I'll try your changes to see if they accomplish the same things. Based on Justin's post, I'm now more concerned about running my code on a cluster, but will tackle that next. On Thu, Jan 18, 2018 at 9:50 AM, Jon Siwek wrote: > On Wed, Jan 17, 2018 at 6:30 PM, Jim Mellander wrote: > > > Unfortunately, the reschedule > > is: > > schedule 0.01 secs { process_epoch_result(ss, now, data1) }; > > instead of: > > schedule 0.01 secs { SumStats::process_epoch_result(ss, now, data1) }; > > so it silently fails after the first 50 results. > > Thanks, you're right about that. > > > Would be nice to have a > > warning if a script schedules an event that doesn't exist. > > Right again, it would be nice since it has resulted in bugs like this, > though I recall it might not be an easy change to the parser to clear > up the differences in namespacing rules for event identifiers. > > > Attached please find hash_test.bro & (patched) non-cluster.bro > > Thanks for those. I remember you pointing out the potential problem > in the earlier mail and meant to respond to indicate we should fix it > and I must have just forgot, so sorry for that. I had a bit of a > different idea on how to address the iterator invalidation that might > be more understandable: keep a separate list of keys to delete later, > outside the loop. I have my version of your proposed fixes at [1]. > Can you take a look and let me know if that works for you? > > - Jon > > [1] https://github.com/bro/bro/commit/3495b2fa9d84e8105a79e24e4e9a2f > 9181318f1a > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180118/b85d9fcf/attachment-0001.html From jan.grashoefer at gmail.com Mon Jan 22 10:38:46 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Mon, 22 Jan 2018 19:38:46 +0100 Subject: [Bro-Dev] Bro package granularity Message-ID: Hi, packaging some POC-seen scripts for the intel framework I was wondering what would be the preferred granularity of Bro packages. In case of seen scripts, it feels extreme to generate a package for every script. So one approach would be to group them into a single package and let the user load the single scripts on demand. But, some of the scripts might depend on other packages. These packages would be suggested during install. Assuming a minimal install this could lead to a couple of scripts, that spit errors if loaded. So if someone decides to load the scripts later, he or she might forgot about the dependencies. In that case it would be nice if one could check either for the availability of certain identifiers (lookup_ID didn't work for me due to type clash in comparison) or a package. What would be the preferred way? Thanks, Jan From jsiwek at corelight.com Mon Jan 22 15:31:52 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 22 Jan 2018 17:31:52 -0600 Subject: [Bro-Dev] Bro package granularity In-Reply-To: References: Message-ID: On Mon, Jan 22, 2018 at 12:38 PM, Jan Grash?fer wrote: > packaging some POC-seen scripts for the intel framework I was wondering > what would be the preferred granularity of Bro packages. In case of seen > scripts, it feels extreme to generate a package for every script. Might depend largely on the judgement of the packager, though I think the train of thought will end up being similar enough to provide guidelines like the following: If you think other packagers would depend on your scripts: it probably is better to provide separate packages where each offers the minimal feature set that one would expect to depend upon. If you don't think other packagers would depend on your scripts, then it may make sense to start with a single monolithic package, depending on what you think would be the best thing for your users. A deciding factor/question here could be "does the package have a performance impact by default". If no, a single package for multiple different APIs/features could work alright. If there are performance hits just by installing/loading the package, then it may be easiest for you to separate out those areas that require extra cpu utilization into distinct packages, or else you'd need to provide a user with a set of options (&redef's) that they can use to toggle different features of the monolithic package. > case it would be nice if one could check either for the availability of > certain identifiers (lookup_ID didn't work for me due to type clash in > comparison) or a package. Would the @if, @ifdef, @ifndef directives (e.g. for preprocessing away whole chunks of code) work for what you were trying to do? There's also maybe the `bro_script_loaded` event you could use to set a global flag and then branch on that in later code paths. - Jon From scampbell at lbl.gov Mon Jan 22 21:31:28 2018 From: scampbell at lbl.gov (Scott Campbell) Date: Mon, 22 Jan 2018 21:31:28 -0800 Subject: [Bro-Dev] input framework and tuning options Message-ID: I have been using the input framework with great success as a tool to read and parse structured text logs. Unfortunately I have reached a performance impasse and was looking for a little advice. The data source is a log file that grows at ~7-9k records/sec and consists of small text lines of < 512 bytes, newline delimited. The primary symptom here is a steadily growing memory footprint even though the back end analyzer seems to be processing the events in near real time - i.e. there is obviously some buffering going on but the data is being consumed. The footprint for script side variables is not to blame as it is always << 1% of the total. I tried modifying Raw::block_size to better fit the line size, but that made it worse. Increasing it to 16k seemed to be the sweet spot, but the problem is still there. Any thoughts on what might help here (besides lower data rates)? thanks! scott From jmellander at lbl.gov Tue Jan 23 12:08:43 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Tue, 23 Jan 2018 12:08:43 -0800 Subject: [Bro-Dev] Buggy bro sort() function Message-ID: Hi all: The attached brogram demonstrates that the bro sort() function does not sort correctly under certain circumstances (tested on OS/X & Linux). The behavior also occurs when using the common function idiom of sort(myvec, function(a: int, b: int): int { return a-b;}); I haven't examined bro source code, but since some of the test values are larger than 32 bits, I surmise that there is a casting from 64 to 32 bits that could change the sign of the comparison, thus causing this problem. Mitigation is to use a function that returns the sign of subtraction results, rather than the actual subtraction results, something like sort(myvec, function(a: int, b: int): int { return ab ? 1 : 0);}); Cheers, Jim Mellander ESNet -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180123/bebb28d9/attachment.html -------------- next part -------------- A non-text attachment was scrubbed... Name: badsort.bro Type: application/octet-stream Size: 538 bytes Desc: not available Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180123/bebb28d9/attachment.obj From jan.grashoefer at gmail.com Wed Jan 24 06:38:49 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Wed, 24 Jan 2018 15:38:49 +0100 Subject: [Bro-Dev] Bro package granularity In-Reply-To: References: Message-ID: <95d5908c-93d1-37cd-74cc-cb8fc5459634@gmail.com> On 23/01/18 00:31, Jon Siwek wrote: > Might depend largely on the judgement of the packager, though I think > the train of thought will end up being similar enough to provide > guidelines like the following: > > If you think other packagers would depend on your scripts: it probably > is better to provide separate packages where each offers the minimal > feature set that one would expect to depend upon. That's a good rule of thumb I think! > Would the @if, @ifdef, @ifndef directives (e.g. for preprocessing away > whole chunks of code) work for what you were trying to do? I finally decided to use a @ifdef construction in __load__.bro, which seems to be the best fit for now. Thanks a lot for your feedback! Jan From jmellander at lbl.gov Wed Jan 24 10:04:09 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Wed, 24 Jan 2018 10:04:09 -0800 Subject: [Bro-Dev] Buggy bro sort() function In-Reply-To: References: Message-ID: Turns out that a number of other BIFs are not 64-bit safe, rand(), order(), to_int() are examples. Filing a bug report. On Tue, Jan 23, 2018 at 12:08 PM, Jim Mellander wrote: > Hi all: > > The attached brogram demonstrates that the bro sort() function does not > sort correctly under certain circumstances (tested on OS/X & Linux). The > behavior also occurs when using the common function idiom of sort(myvec, > function(a: int, b: int): int { return a-b;}); > > I haven't examined bro source code, but since some of the test values are > larger than 32 bits, I surmise that there is a casting from 64 to 32 bits > that could change the sign of the comparison, thus causing this problem. > > Mitigation is to use a function that returns the sign of subtraction > results, rather than the actual subtraction results, something like > sort(myvec, function(a: int, b: int): int { return ab ? 1 : > 0);}); > > Cheers, > > Jim Mellander > ESNet > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180124/0286dfb2/attachment.html From mfernandez at mitre.org Thu Jan 25 09:28:23 2018 From: mfernandez at mitre.org (Fernandez, Mark I) Date: Thu, 25 Jan 2018 17:28:23 +0000 Subject: [Bro-Dev] Bro DCE-RPC Analyzer Questions Message-ID: Bro-Dev Group, I am doing a little research into using Bro to log and analyze specific Microsoft DCE-RPC interfaces and methods. I notice that the Bro events for 'dce_rpc_request' and 'dce_rpc_response' provide the length of the RCP data stub (aka 'stub_len'). I found reference that these events previously provided a byte string containing the stub data itself, but at some point it was reduced to just the stub_len instead. I have a few questions that I hope you could answer: 1. What was the reason you decided to remove the stub data from the events and pass only the stub length? 1. On github, I see a BinPAC file for the RPC 'At' service (bro/src/analyzerprotocol/dce-rpc/endpoint-atsvc.pac), but there are no events generated by it. I think this would be very useful for my project. What is the reason that you have the analyzer, but no events for scriptland? 1. I have a use case, for a very few, limited number of RPC interfaces/methods, where I need to receive the stub data in scriptland for logging and analysis. How do you recommend I approach this scenario? I see a couple options: 1. I could customize the DCE-RPC analyzer to pass the sub data for *ALL* 'dce_rpc_request' and 'dce_rpc_response' events; or 2. I could customize the DCE-RPC analyzer to create new events specifically for the interfaces/methods (aka UUIDs/OpNums) that I care about. 3. Other ideas? I think both (a) and (b) will achieve the desired result; but there are trade-offs, pros and cons. I wonder which option would have a more negative impact on Bro performance? I imagine the reason you stopped passing stub data for all events was due to the performance hit, so I want to approach this in the best way possible. I appreciate your feedback. Cheers! Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180125/d05eb044/attachment.html From robin at icir.org Fri Jan 26 11:40:03 2018 From: robin at icir.org (Robin Sommer) Date: Fri, 26 Jan 2018 11:40:03 -0800 Subject: [Bro-Dev] 'async' update and proposal Message-ID: <20180126194003.GA1786@icir.org> A while ago we've been discussing adding a new "async" keyword to run certain functions asynchronously; Bro would proceed with other script code until they complete. Summary is here: https://www.bro.org/development/projects/broker-lang-ext.html#asynchronous-executions-without-when After my original proof-of-concept version, I now have a 2nd-gen implementation mostly ready that internally unifies the machinery for "when" and "async". That simplifies the code and, more importantly, makes the two work together more smoothly. The branch for that work is topic/robin/sync, I'll clean that up a bit more and would then be interested in seeing somebody test drive it. In the meantime I want to propose a slight change to the original plan. In earlier discussions, we ended up concluding that introducing a new keyword to trigger the asynchronous behaviour is useful for the script writer, as it signals that semantics are different for these calls. Example using the syntax we arrived at: event bro_init() { local x = async lookup_hostname("www.icir.org"); # A print "IP of www.icir.org is", x; # B } Once the DNS request is issued in line A, the event handler will be suspended until the answer arrives. That means that other event handlers may execute before line B, i.e., execution order isn't fully deterministic anymore. The use of "async" is pointing out that possibility. However, look at the following example. Let's say we want to outsource such DNS functionality into a separate framework, like in this toy example: # cat test.bro @load my-dns event bro_init() { local x = MyCoolDNSFramework::lookup("www.icir.org"); # A print "IP of www.icir.org is", x; # B } # cat my-dns.bro module MyCoolDNSFramework; export { global lookup: function(name: string) : set[addr]; } function lookup(name: string) : set[addr] { local addrs = async lookup_hostname(name); # C return addrs; # D } That example behaves exactly as the 1st: execution may suspend between lines A and B because the call to MyCoolDNSFramework::lookup() executes an asynchronous function call internally (it will hold between C and D). But in this 2nd example that behaviour is not apparent at the call site in line A. We could require using "async" in line A as well but that would be extremely painful: whenever calling some function, one would need to know whether internally the callee may end up using "async" somewhere (potentially buried further deep inside its call stack). I think we should instead just skip the "async" keyword altogether. Requiring it at some places, but not others, hurts more than it helps in my opinion. The 1st example would then just go back to look like this: event bro_init() { local x = lookup_hostname("www.icir.org"); # A print "IP of www.icir.org is", x; # B } This would still behave the same as before: potential suspension between A and B. I don't think skipping "async" this would be a big deal for anything, as the cases where the new behaviour may actually lead to significant differences should be rare. Thoughts? Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From johanna at icir.org Fri Jan 26 21:25:30 2018 From: johanna at icir.org (Johanna Amann) Date: Fri, 26 Jan 2018 21:25:30 -0800 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <20180126194003.GA1786@icir.org> References: <20180126194003.GA1786@icir.org> Message-ID: <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> > I don't think skipping "async" this would be a big deal for anything, > as the cases where the new behaviour may actually lead to significant > differences should be rare. After pondering this for a while I am a bit afraid that skipping async completely might lead to quite hard to debug problems in scripts. At the moments, Bro scripts are basically predictable - e.g. once an event is executed you know that everything in that event happens before anything else is executed. Having asynchronous functions (obviously) changes that - now other things will execute while you wait for something to happen. Without an async keyword, users basically have no direct way to determine if a function will run to the end without interruption. This can be especially problematic if you manipulate data structures in an event that have to be present in later events (as it is now commonly done in scripts). Consider e.g. an example like the following event protocol_event_1(...) { c$proto$la = function_call; } event protocol_event_end(...) { Log::write([....c$proto$la...]); } If I understand everything correctly, in this case it is not guaranteed to be present - it still might be waiting for function_call (if it is asynchronous). This might not be a problem for cases in which functions that obviously need DNS lookups are used - but if this is hidden between a few layers of indirection it will get really hard to reason about this. I actually think that it makes sense to be more explicit there. So - require async in front of all bifs, etc that are asynchronous. And if a user creates a function that in turn calls an asynchronous function, I think we should require that function to be called using async too. Either a user knows that a function uses an asynchronous function, or the script interpreter will raise an error message telling them that the async keyword is required here because an async function is in the call stack. While this might be a bit painful at times I think this still is better because it makes the script writer aware that things might be interrupted at this point. So - my argument is basically exactly the reverse of yours - if an async function is somewhere in the call stack I definitely would want to know about this when writing my script - otherwhise I can see really nasty bugs happening. If we want to make it more explicit that a function is potentially asynchronous, we also could consider requiring the function definition to mark this explicitly, e.g. just by adding the async keyword in front: async function lookup(name: string) : set[addr] { local addrs = async lookup_hostname(name); return addrs; } ...but that is not strictly necessary and does not look that pretty. Johanna From jazoff at illinois.edu Sun Jan 28 07:13:11 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Sun, 28 Jan 2018 15:13:11 +0000 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> Message-ID: > On Jan 27, 2018, at 12:25 AM, Johanna Amann wrote: > Consider e.g. an example like the following > > event protocol_event_1(...) > { > c$proto$la = function_call; > } > > event protocol_event_end(...) > { > Log::write([....c$proto$la...]); > } I was thinking the same thing :-) The notice framework currently has this problem, and needs to work around it: ## Adding a string "token" to this set will cause the notice ## framework's built-in emailing functionality to delay sending ## the email until either the token has been removed or the ## email has been delayed for :bro:id:`Notice::max_email_delay`. email_delay_tokens: set[string] &optional; So that the extend-email hostnames script can do add n$email_delay_tokens["hostnames-src"]; when ( local src_name = lookup_addr(n$src) ) { output = string_cat("orig/src hostname: ", src_name, "\n"); tmp_notice_storage[uid]$email_body_sections[|tmp_notice_storage[uid]$email_body_sections|] = output; delete tmp_notice_storage[uid]$email_delay_tokens["hostnames-src"]; } So 'async' or no 'async' keyword, I think as soon as bro starts doing more things asynchronous a lot of synchronization/ordering issues come into play. Even stuff like event protocol_event_1(...) &priority=1 { c$proto$la = function_call; } event protocol_event_1(...) { ... } Currently the 2nd event handler is guaranteed to be ran only after the first finishes running, right? But what if the first handler does an async operation? Does the 2nd event handler wait for the async operation to finish, or does it run as soon as the higher priority event function hits an async operation? If that works, it would be because there's an explicit dependency on the higher priority event with the lower priority event. For 'protocol_event_1' and 'protocol_event_end' there's no explicit dependency other than that the analyzer raises the events with a known ordering. If an earlier event can trigger an async operation then all of the assumed ordering goes out the window. ? Justin Azoff From jan.grashoefer at gmail.com Sun Jan 28 12:45:11 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Sun, 28 Jan 2018 21:45:11 +0100 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> Message-ID: <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> First of all, this async keyword reminds me of asynchronous programming in C#: https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/ As I only used it in a student project, I don't have real experience but maybe someone who is used to that paradigm in C# can provide some valuable hints. >> So - my argument is basically exactly the reverse of yours - if an async >> function is somewhere in the call stack I definitely would want to know >> about this when writing my script - otherwhise I can see really nasty bugs >> happening. I agree on that! For the C# async paradigm, people say that async is like a zombie plague as a single asynchronous function can start "infecting" your code base by propagating async through the call graph. However, I would prefer being explicit, in particular as in case of Bro the plague syntactically stops at the event level. > So 'async' or no 'async' keyword, I think as soon as bro starts doing more things asynchronous a lot of synchronization/ordering issues come into play. > > Even stuff like > > event protocol_event_1(...) &priority=1 > { > c$proto$la = function_call; > } > > event protocol_event_1(...) > { > ... > } > > [...] > > For 'protocol_event_1' and 'protocol_event_end' there's no explicit dependency other than that the analyzer raises the events with a known ordering. > If an earlier event can trigger an async operation then all of the assumed ordering goes out the window. That's a really good example I think. However, with async around, I would argue that the priority can be used to determine when an event is scheduled but not when it is finished. In cases where ordering is important, wouldn't something like the following work? event protocol_event_1(...) { c$steps_left = 2; c$proto$la = async function_call; c$steps_left--; if ( c$steps_left <= 0 ) schedule 0sec { protocol_event_end(...) }; } event protocol_event_2(...) { # do stuff c$steps_left--; if ( c$steps_left <= 0 ) schedule 0sec { protocol_event_end(...) }; } event protocol_event_end(...) { # Log write or whatever } As it is explicit as well, this pattern could significantly blow up scripts. Anyway, I think asynchronous control flow is just more complex than sequential, no matter how much syntactic sugar is put on top. Jan From robin at icir.org Mon Jan 29 09:00:14 2018 From: robin at icir.org (Robin Sommer) Date: Mon, 29 Jan 2018 09:00:14 -0800 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> Message-ID: <20180129170014.GA39249@icir.org> Jan wrote: > First of all, this async keyword reminds me of asynchronous programming > in C#: Nice, didn't know that. > For the C# async paradigm, people say that async is like a zombie > plague as a single asynchronous function can start "infecting" your > code base by propagating async through the call graph. That's exactly my point. If we require a keyword to be used for all asynchronous behavior, it will need to be put in place across the whole call stack whenever there's even a the slightest chance of "async" being used somewhere far down inside a framework. I hear you all on the advantages of making asynchronous behavior explicit, I fully agree with that. I just don't see it as practical from the script writer's perspective. Johanna wrote: > And if a user creates a function that in turn calls an asynchronous > function, I think we should require that function to be called using > async too. Either a user knows that a function uses an asynchronous > function, or the script interpreter will raise an error message > telling them that the async keyword is required here because an async > function is in the call stack. The problem is that the interpreter cannot determine that statically (because control flow isn't static), we'd have to resort to runtime errors -- and that means that code that forgets to use "async" may run fine for a while until it happens to go down that one branch that does, e.g., a DNS lookup. If we required that all the relevant functions (and function delcarations) get declared as "async", like in C#, then I believe we could detect errors statcially. But we'd end up having to put that async declaration into a lot of places just on the chance that asynchronous behavior could be used somewhere. Consider for example the plugin functions in NetControl: They'd need to be "async" just so that someone *could* do DNS lookups in there. Same for hooks: by definition we don't know what they'll do, so they'll need to be "async". And that in turn means that NOTICE() for example must become "async" because it's running a hook. Now everytime we do a NOTICE, we need to put an "async" in front. And everytime we call a function that might generate a NOTICE, we'd write "async" there, too. The point of dependencies/order becoming harder to understand is valid of course. We already have that challenge with "when" and maybe we need to find different solutions there to expresse sequentiality requirements somehow. Justin wrote: > event protocol_event_1(...) &priority=1 > event protocol_event_1(...) > Currently the 2nd event handler is guaranteed to be ran only after the > first finishes running, right? Correct, and that's actually something we could ensure even with "async": we could treat the whole set of all handlers as one block that gets suspended as a whole if an asynchronous function runs. But as you point out, that wouldn't solve inter-event dependencies. Per Jan's mail, one can work around that with custom code, yet it would be much nicer if we had built-in support for that. Actually, I think one possible solution has been floating around for a while already: event *scopes* that express serialization requirements in terms of shared context. Most common example: serialize all events that are triggered by the same connection. Originally this came up in the context of running event handlers concurrently. I believe it would solve this problem here too: when a function suspends, halt all handlers that depend on the same context (e.g., same connection). More on that idea in this paper: http://www.icir.org/robin/papers/ccs14-concurrency.pdf Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at corelight.com Mon Jan 29 11:58:35 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 29 Jan 2018 13:58:35 -0600 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <20180129170014.GA39249@icir.org> References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> <20180129170014.GA39249@icir.org> Message-ID: On Mon, Jan 29, 2018 at 11:00 AM, Robin Sommer wrote: > as you point out, that wouldn't solve inter-event dependencies. > Per Jan's mail, one can work around that with custom code The inter-event dependencies + code understandability/readability issue that Johanna points out is maybe something that would also bother me if it weren't addressed. And taking Jan's idea a bit further, if we had more complete coroutine/yield support in scripts you could write: event protocol_event_1(...) { c$proto$la = function_call; } event protocol_event_end(...) { yield WaitForProtoFields(); Log::write([....c$proto$la...]); } There, WaitForProtoFields() would be an arbitrary/custom function you write. E.g. return false unless all fields of c$proto have been set. So here, maybe the code is more readable because the depender is now explicit about their dependencies. Though I think it's still problematic to know when to write that code because you would still have to rely on either memory (or documentation) to tell you that 'function_call' is actually not synchronous. And if 'function_call' starts as a synchronous function and later changes, that's also kind of a problem, so you might see people cautiously implementing the same type of code patterns everywhere even if not required for some cases. > Actually, I think one > possible solution has been floating around for a while already: event > *scopes* that express serialization requirements in terms of shared > context. Most common example: serialize all events that are triggered > by the same connection. Originally this came up in the context of > running event handlers concurrently. I believe it would solve this > problem here too: when a function suspends, halt all handlers that > depend on the same context (e.g., same connection). More on that idea > in this paper: http://www.icir.org/robin/papers/ccs14-concurrency.pdf Yeah, I think this approach could actually work really well to aid in reasoning about event ordering. The first example that came to mind was adding an attribute to event handlers that a user wants to be serialized as part of a context: event protocol_event_1(c: connection ...) &context = { return c$id; } { ... } I only skimmed the paper, though seemed like it outlined a similar way of generalizing contexts/scopes ? - Jon From jmellander at lbl.gov Mon Jan 29 13:15:26 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Mon, 29 Jan 2018 13:15:26 -0800 Subject: [Bro-Dev] Misleading error message Message-ID: Hi all: I was tinkering with the sumstats code, and inadvertantly deleted the final "}" closing out the last function. When running the code, the misleading error message is received: error in /Users/melland/traces/bro/share/bro/base/frameworks/tunnels/./main.bro, line 8: syntax error, at or near "module" presumably due to the function still being open when the next policy script is loaded. Wouldn't it be more reasonable to check at the end of each script when loaded that there are no dangling functions, expressions, etc. ???? Jim Mellander ESNet -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180129/b18e58b4/attachment.html From jsiwek at corelight.com Mon Jan 29 14:45:19 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 29 Jan 2018 16:45:19 -0600 Subject: [Bro-Dev] Misleading error message In-Reply-To: References: Message-ID: On Mon, Jan 29, 2018 at 3:15 PM, Jim Mellander wrote: > Wouldn't it be more reasonable to check at the end of each > script when loaded that there are no dangling functions, expressions, etc. > ???? Yes, it's definitely reasonable to want a better error message here and I don't think it would be that difficult to change the parser to track the necessary state to emit a better message. Feel free to file a ticket/bug. - Jon From jsiwek at corelight.com Mon Jan 29 15:13:02 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 29 Jan 2018 17:13:02 -0600 Subject: [Bro-Dev] input framework and tuning options In-Reply-To: References: Message-ID: On Mon, Jan 22, 2018 at 11:31 PM, Scott Campbell wrote: > The data source is a log file that grows at ~7-9k records/sec and > consists of small text lines of < 512 bytes, newline delimited. > > The primary symptom here is a steadily growing memory footprint even > though the back end analyzer seems to be processing the events in near > real time - i.e. there is obviously some buffering going on but the data > is being consumed. The footprint for script side variables is not to > blame as it is always << 1% of the total. The main categories of problems to check for that come to mind: (a) Rate of production exceeds rate of consumption (b) Unbounded script state accumulation (c) Unbounded core state accumulation (d) Memory leak It sounds like you've ruled out (a) and (b). For the others, using a heap profiler/checker is going to help. There's a brief guide at [1] on finding memory leaks in Bro that you can try. Else if you can provide a simple test case that reproduces the behavior, filing a bug/ticket with that info would be the best way to get someone to help look into it with you. - Jon [1] https://www.bro.org/development/howtos/leaks.html From seth at corelight.com Tue Jan 30 07:11:58 2018 From: seth at corelight.com (Seth Hall) Date: Tue, 30 Jan 2018 10:11:58 -0500 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <20180129170014.GA39249@icir.org> References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> <20180129170014.GA39249@icir.org> Message-ID: <7C8457A8-5186-4E39-94B8-ACC0C0E2C9B6@corelight.com> On 29 Jan 2018, at 12:00, Robin Sommer wrote: > Actually, I think one > possible solution has been floating around for a while already: event > *scopes* that express serialization requirements in terms of shared > context. I like this idea a lot! Do you foresee that causing trouble if we went that direction though? It seems like it could cause trouble by causing events to backup waiting for some other event to finish executing. .Seth -- Seth Hall * Corelight, Inc * www.corelight.com From robin at icir.org Tue Jan 30 07:38:42 2018 From: robin at icir.org (Robin Sommer) Date: Tue, 30 Jan 2018 07:38:42 -0800 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> <20180129170014.GA39249@icir.org> Message-ID: <20180130153842.GB39249@icir.org> On Mon, Jan 29, 2018 at 13:58 -0600, you wrote: > And if 'function_call' starts as a synchronous function and later > changes, that's also kind of a problem, so you might see people > cautiously implementing the same type of code patterns everywhere > even if not required for some cases. That's a good point more generally: if we require "async" at call sites, an internal change to a framework can break existing code. > event protocol_event_1(c: connection ...) &context = { return c$id; } { ... } > > I only skimmed the paper, though seemed like it outlined a similar way > of generalizing contexts/scopes ? Yeah, that's pretty much the idea there. For concurrency, we'd hash that context value and use that to determine a target threat to schedule execution too, just like in a cluster the process/machine is determined. An attribute can work if we're confident that the relevant information can always be extracted from the event parameters. In a concurrent prototype many years ago we instead used a hardcoded set of choices based on the underlying connection triggering the event (5-tuple, host pair, src IP, dst IP). So you'd write (iirc): event protocol_event_1(c: connection ...) &scope = connection That detaches the context calculation from event parameters, with the obvious disadvantage that it can't be customized any further. May be there's some middle ground where we'd get both. (To clarify terminology: In that paper "scope" is the scheduling granularity, e.g., "by connection". "context" is the current instantiation of that scope (e.g., "1.2.3.4:1234,2.3.4.5:80" for connection scope). Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From robin at icir.org Tue Jan 30 07:43:46 2018 From: robin at icir.org (Robin Sommer) Date: Tue, 30 Jan 2018 07:43:46 -0800 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <7C8457A8-5186-4E39-94B8-ACC0C0E2C9B6@corelight.com> References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> <20180129170014.GA39249@icir.org> <7C8457A8-5186-4E39-94B8-ACC0C0E2C9B6@corelight.com> Message-ID: <20180130154346.GC39249@icir.org> On Tue, Jan 30, 2018 at 10:11 -0500, you wrote: > I like this idea a lot! Yeah, I like it, too. Additional benefit: it actually opens the door for parallelization again, too ... > Do you foresee that causing trouble if we went that direction > though? It seems like it could cause trouble by causing events to > backup waiting for some other event to finish executing. It could. The async operations all time out, so there's a cap to how long things can get stalled, but still: if that happens to many async operations simultaneously, we could end up we lots of stuff in flight. On the other hand, I don't think this can be avoided though: either we want dependencies or we don't. You can't have the cake and it eat it too I guess. :) Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at corelight.com Tue Jan 30 08:28:01 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 30 Jan 2018 10:28:01 -0600 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: <20180130153842.GB39249@icir.org> References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> <20180129170014.GA39249@icir.org> <20180130153842.GB39249@icir.org> Message-ID: On Tue, Jan 30, 2018 at 9:38 AM, Robin Sommer wrote: > An attribute can work if we're confident that the relevant information > can always be extracted from the event parameters. In a concurrent > prototype many years ago we instead used a hardcoded set of choices > based on the underlying connection triggering the event (5-tuple, host > pair, src IP, dst IP). So you'd write (iirc): > > event protocol_event_1(c: connection ...) &scope = connection > > That detaches the context calculation from event parameters, with the > obvious disadvantage that it can't be customized any further. May be > there's some middle ground where we'd get both. Yeah, it seems open to having multiple methods available for the user to choose from: dynamic call to script-land, dynamic calculation in core (select from predefined list), or even a static value (not that I can think of a particular place that would actually use that right now). Was there more benefit of using the predefined choice than saving the overhead of calling out to script-land to do the context calculation? - Jon From robin at icir.org Tue Jan 30 09:08:20 2018 From: robin at icir.org (Robin Sommer) Date: Tue, 30 Jan 2018 09:08:20 -0800 Subject: [Bro-Dev] 'async' update and proposal In-Reply-To: References: <20180126194003.GA1786@icir.org> <20180127052530.3l4vnuhjoi6rnvs2@Beezling.local> <7289248a-4c95-9729-1fd0-59010cdd84ed@gmail.com> <20180129170014.GA39249@icir.org> <20180130153842.GB39249@icir.org> Message-ID: <20180130170820.GE39249@icir.org> On Tue, Jan 30, 2018 at 10:28 -0600, you wrote: > Was there more benefit of using the predefined choice than saving the > overhead of calling out to script-land to do the context calculation? No, don't think so. It mainly came out of an analysis of existing scripts, and those 5-tuple based subsets were the main use case anways. Actually I'm not even sure anymore if there might have been a custom execute-my-own-function scope as well, I'll see if I can find the old code somewhere. Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From johanna at icir.org Tue Jan 30 15:36:11 2018 From: johanna at icir.org (Johanna Amann) Date: Tue, 30 Jan 2018 15:36:11 -0800 Subject: [Bro-Dev] Merged branches deletion Message-ID: <20180130233607.a2ehpp7at3yb5373@user237.sys.ICSI.Berkeley.EDU> Hi, I am going to delete these (merged) branches thursday, unless someone feels especially attached to them: topic/dnthayer/ticket1821 topic/dnthayer/ticket1836 topic/dnthayer/ticket1863 topic/jazoff/contentline-limit topic/jazoff/fix-gridftp topic/jazoff/fix-intel-error topic/jazoff/speedup-for topic/robin/broker-logging topic/robin/event-args topic/robin/plugin-version-check topic/seth/add-file-lookup-functions topic/seth/input-thread-behavior topic/seth/remove-dns-weird topic/vladg/bit-1838 Johanna From seth at corelight.com Wed Jan 31 07:51:02 2018 From: seth at corelight.com (Seth Hall) Date: Wed, 31 Jan 2018 10:51:02 -0500 Subject: [Bro-Dev] Merged branches deletion In-Reply-To: <20180130233607.a2ehpp7at3yb5373@user237.sys.ICSI.Berkeley.EDU> References: <20180130233607.a2ehpp7at3yb5373@user237.sys.ICSI.Berkeley.EDU> Message-ID: <2B8B47E7-3925-4A42-8E30-FED7CBC13508@corelight.com> Thanks for sweeping through all of these branches! I think I have a few extra branches that I could get rid of that haven't been merged too, this is a good reminder that I should get those cleaned up. .Seth On 30 Jan 2018, at 18:36, Johanna Amann wrote: > Hi, > > I am going to delete these (merged) branches thursday, unless someone > feels especially attached to them: > > topic/dnthayer/ticket1821 > topic/dnthayer/ticket1836 > topic/dnthayer/ticket1863 > topic/jazoff/contentline-limit > topic/jazoff/fix-gridftp > topic/jazoff/fix-intel-error > topic/jazoff/speedup-for > topic/robin/broker-logging > topic/robin/event-args > topic/robin/plugin-version-check > topic/seth/add-file-lookup-functions > topic/seth/input-thread-behavior > topic/seth/remove-dns-weird > topic/vladg/bit-1838 > > Johanna > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev -- Seth Hall * Corelight, Inc * www.corelight.com From seth at corelight.com Wed Jan 31 08:05:23 2018 From: seth at corelight.com (Seth Hall) Date: Wed, 31 Jan 2018 11:05:23 -0500 Subject: [Bro-Dev] Bro DCE-RPC Analyzer Questions In-Reply-To: References: Message-ID: The original idea was to get extensive parsing in place for DCE-RPC messages by parsing the IDL files for those services. Someone in the community had hoped to take it on, but hasn't had time yet to complete it. If you'd be interested in discussing that, I think that could be a huge addition to Bro (hundreds of new events!). The At service parsing file is still there because I didn't want to lose track of it but I think there was some slight architectural change that needed to happen before I could pass data to it. I don't think that data is even going to that parser, it's not just that there aren't events. I'd have to refer back to the code to see what exactly is wrong though. As for an approach to this problem right now, I'd prefer to see parsing done in the core. Architecturally we try to avoid passing unparsed data to Bro script land because of performance concerns and we generally don't have the intrinsic tools to be able to do parsing well in Bro scripts. .Seth On 25 Jan 2018, at 12:28, Fernandez, Mark I wrote: > Bro-Dev Group, > > I am doing a little research into using Bro to log and analyze > specific Microsoft DCE-RPC interfaces and methods. I notice that the > Bro events for 'dce_rpc_request' and 'dce_rpc_response' provide the > length of the RCP data stub (aka 'stub_len'). I found reference that > these events previously provided a byte string containing the stub > data itself, but at some point it was reduced to just the stub_len > instead. I have a few questions that I hope you could answer: > > > 1. What was the reason you decided to remove the stub data from the > events and pass only the stub length? > > > > 1. On github, I see a BinPAC file for the RPC 'At' service > (bro/src/analyzerprotocol/dce-rpc/endpoint-atsvc.pac), but there are > no events generated by it. I think this would be very useful for my > project. What is the reason that you have the analyzer, but no events > for scriptland? > > > > 1. I have a use case, for a very few, limited number of RPC > interfaces/methods, where I need to receive the stub data in > scriptland for logging and analysis. How do you recommend I approach > this scenario? I see a couple options: > > 1. I could customize the DCE-RPC analyzer to pass the sub data for > *ALL* 'dce_rpc_request' and 'dce_rpc_response' events; or > 2. I could customize the DCE-RPC analyzer to create new events > specifically for the interfaces/methods (aka UUIDs/OpNums) that I care > about. > 3. Other ideas? > > I think both (a) and (b) will achieve the desired result; but there > are trade-offs, pros and cons. I wonder which option would have a > more negative impact on Bro performance? I imagine the reason you > stopped passing stub data for all events was due to the performance > hit, so I want to approach this in the best way possible. I > appreciate your feedback. > > Cheers! > Mark > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev -- Seth Hall * Corelight, Inc * www.corelight.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180131/d5b67355/attachment.html From mfernandez at mitre.org Wed Jan 31 09:14:45 2018 From: mfernandez at mitre.org (Fernandez, Mark I) Date: Wed, 31 Jan 2018 17:14:45 +0000 Subject: [Bro-Dev] Bro DCE-RPC Analyzer Questions In-Reply-To: References: Message-ID: Seth, >> If you'd be interested in discussing that, I think that could be a huge >> addition to Bro (hundreds of new events!). Yes, I am interested in parsing IDL files, but I plan to do so very selectively. For example, for the At service, I don?t care about all four opnums it exposes... I just care about NetrJobAdd (which indeed is how you designed, too), and I want to log the command and the remote host to which the command was sent. Similarly, for the Server Service, I don?t care about all fifty opnums that it exposes... I just care about a handful of them, and I want to log key pieces of information associated with the function call. At this point, I have my eye on a few, maybe up to a dozen, specific RPC UUIDs. What is it you would be looking for? Would you want parsers for every opnum in the IDL file? Or just select functions? >> ...I'd prefer to see parsing done in the core. Architecturally we try to >> avoid passing unparsed data to Bro script land... Thank you. That is very important guidance, exactly what I was looking for. It gives me a definitive starting point. >> The At service parsing file is still there... but I think there was some slight >> architectural change that needed to happen before I could pass data to it. An architectural change? The sound of that makes me worry. I see a couple of approaches: 1. In ?dce_rpc-analyzer.pac? we could customize the function ?process_dce_rpc_request?. We could have it lookup certain UUIDs, such as At-svc, and it if matches, then call InstantiateAnalyzer and DeliverStream, just like you do for RPC authentication with GSSAPI and NTLM. Pro: Could be implemented easily and quickly. Con: Need a new analyzer for each RPC UUID. 1. In ?dce_rpc-protocol.pac? we could customize the record ?DCE_RPC_Request? to change the ?stub? data element to be a big case statement switching on the UUID, akin to ?SMB_PDU? within the SMB analyzer, where the ?message? data element switches based on the SMB command. Pro: This is probably the preferred long-term solution. Con: It may be a little more challenging for me to code it correctly, take me a lot longer to implement. Am I close to the right answer for sending data to the at-svc parser? Thanks, Mark From: Seth Hall [mailto:seth at corelight.com] Sent: Wednesday, January 31, 2018 11:05 AM To: Fernandez, Mark I Cc: bro-dev at bro.org Subject: Re: [Bro-Dev] Bro DCE-RPC Analyzer Questions The original idea was to get extensive parsing in place for DCE-RPC messages by parsing the IDL files for those services. Someone in the community had hoped to take it on, but hasn't had time yet to complete it. If you'd be interested in discussing that, I think that could be a huge addition to Bro (hundreds of new events!). The At service parsing file is still there because I didn't want to lose track of it but I think there was some slight architectural change that needed to happen before I could pass data to it. I don't think that data is even going to that parser, it's not just that there aren't events. I'd have to refer back to the code to see what exactly is wrong though. As for an approach to this problem right now, I'd prefer to see parsing done in the core. Architecturally we try to avoid passing unparsed data to Bro script land because of performance concerns and we generally don't have the intrinsic tools to be able to do parsing well in Bro scripts. .Seth On 25 Jan 2018, at 12:28, Fernandez, Mark I wrote: Bro-Dev Group, I am doing a little research into using Bro to log and analyze specific Microsoft DCE-RPC interfaces and methods. I notice that the Bro events for ?dce_rpc_request? and ?dce_rpc_response? provide the length of the RCP data stub (aka ?stub_len?). I found reference that these events previously provided a byte string containing the stub data itself, but at some point it was reduced to just the stub_len instead. I have a few questions that I hope you could answer: 1. What was the reason you decided to remove the stub data from the events and pass only the stub length? 1. On github, I see a BinPAC file for the RPC ?At? service (bro/src/analyzerprotocol/dce-rpc/endpoint-atsvc.pac), but there are no events generated by it. I think this would be very useful for my project. What is the reason that you have the analyzer, but no events for scriptland? 1. I have a use case, for a very few, limited number of RPC interfaces/methods, where I need to receive the stub data in scriptland for logging and analysis. How do you recommend I approach this scenario? I see a couple options: a. I could customize the DCE-RPC analyzer to pass the sub data for *ALL* ?dce_rpc_request? and ?dce_rpc_response? events; or b. I could customize the DCE-RPC analyzer to create new events specifically for the interfaces/methods (aka UUIDs/OpNums) that I care about. c. Other ideas? I think both (a) and (b) will achieve the desired result; but there are trade-offs, pros and cons. I wonder which option would have a more negative impact on Bro performance? I imagine the reason you stopped passing stub data for all events was due to the performance hit, so I want to approach this in the best way possible. I appreciate your feedback. Cheers! Mark _______________________________________________ bro-dev mailing list bro-dev at bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev -- Seth Hall * Corelight, Inc * www.corelight.com -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180131/82d5058f/attachment-0001.html From seth at corelight.com Wed Jan 31 11:22:17 2018 From: seth at corelight.com (Seth Hall) Date: Wed, 31 Jan 2018 14:22:17 -0500 Subject: [Bro-Dev] Bro DCE-RPC Analyzer Questions In-Reply-To: References: Message-ID: On 31 Jan 2018, at 12:14, Fernandez, Mark I wrote: > Am I close to the right answer for sending data to the at-svc parser? Yep, I don't know the "best" answer, but I think you're on the right track with either direction that you outlined. If that code just magically existed with either of your outlined implementations I don't think we could resist merging it in. ;) .Seth -- Seth Hall * Corelight, Inc * www.corelight.com