From jmellander at lbl.gov Thu Aug 2 16:44:44 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Thu, 2 Aug 2018 16:44:44 -0700 Subject: [Bro-Dev] Writing SumStats plugin Message-ID: Hi all: I'm thinking of writing a SumStats plugin, probably with the initial implementation in bro scriptland, with a re-implementation as BIFs if initial tests successful. >From examining several plugins, it appears that I need to: - Add NAME of my plugin as an enum to Calculation - Add optional tunables to Reducer - Add my data structure to ResultVal - In register_observe_plugins, register the function to take an observation. - In init_result_val_hook, add code to initialize data structure. - In compose_resultvals_hook, add code to merge multiple data structures - Create function to extract from data structure either at epoch_result, or epoch_finished Any thing else I should be aware of? Thanks in advance, Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180802/af977205/attachment.html From robin at corelight.com Fri Aug 3 10:21:34 2018 From: robin at corelight.com (Robin Sommer) Date: Fri, 3 Aug 2018 10:21:34 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180727173929.GA6404@corelight.com> References: <20180727173929.GA6404@corelight.com> Message-ID: <20180803172134.GA56790@corelight.com> On Fri, Jul 27, 2018 at 10:39 -0700, I wrote: > Broker::relay(change_topic, change_topic, Config::cluster_set_option, ID, val, location); Can somebody remind me what the use-case is for changing the topic on relay? Grepping over our standard scripts, I see only one use of relay(), and that's the one above. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Fri Aug 3 13:57:07 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 3 Aug 2018 15:57:07 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180803172134.GA56790@corelight.com> References: <20180727173929.GA6404@corelight.com> <20180803172134.GA56790@corelight.com> Message-ID: On Fri, Aug 3, 2018 at 12:22 PM Robin Sommer wrote: > On Fri, Jul 27, 2018 at 10:39 -0700, I wrote: > > > Broker::relay(change_topic, change_topic, Config::cluster_set_option, ID, val, location); > > Can somebody remind me what the use-case is for changing the topic on > relay? Grepping over our standard scripts, I see only one use of > relay(), and that's the one above. Another use is hidden within Cluster::relay_rr(): event Intel::new_item(item: Item) &priority=5 { if ( Cluster::proxy_pool$alive_count == 0 ) Broker::publish(indicator_topic, Intel::insert_indicator, item); else Cluster::relay_rr(Cluster::proxy_pool, "Intel::new_item_relay_rr", indicator_topic, Intel::insert_indicator, item); } That is, if the manager is currently connected to some proxy, it picks one to do the work of distributing the event to workers. Manager sends 1 message instead of N. I don't know if there's currently other use-cases for Broker::relay specifically, but Cluster::relay_rr/Cluster::relay_hrw is essentially an extension of that which just also does the work of choosing the initial topic based upon a given pool and partition strategy. Might have been Justin who originally pointed out potential for avoiding manager overload in this way. - Jon From robin at corelight.com Mon Aug 6 11:57:32 2018 From: robin at corelight.com (Robin Sommer) Date: Mon, 6 Aug 2018 11:57:32 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References: <20180727173929.GA6404@corelight.com> <20180803172134.GA56790@corelight.com> Message-ID: <20180806185732.GA9621@corelight.com> On Fri, Aug 03, 2018 at 15:57 -0500, Jonathan Siwek wrote: > Another use is hidden within Cluster::relay_rr(): Yeah, though at least from an API perspective this is different: The caller gives relay_rr() only one topic to send to (indicator_topic). It's then using a different topic internally to get it over to the proxy first, but that feels more like an implementation detail. So in that sense I would argue that this is not a use-case for the Broker API letting users change the topic on relay. (I'm not saying that that capability can't be useful, I'm just still looking for actual use cases.) I have another question about this specific case: we use relay_rr() only for sending Intel::insert_indicator. Intel::remove_indicator gets published normally through auto_publish(). Why the difference? Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Mon Aug 6 12:50:11 2018 From: robin at corelight.com (Robin Sommer) Date: Mon, 6 Aug 2018 12:50:11 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180730160101.GE43154@corelight.com> References:

<20180730160101.GE43154@corelight.com> Message-ID: <20180806195011.GA10971@corelight.com> On Mon, Jul 30, 2018 at 09:01 -0700, I wrote: > Is there a summary somewhere of what events & topics the cluster nodes > are currently exchanging? So I went through the exercise of collecting this information: what connections do we have between nodes, who's subscribing to what, and who's publishing what; see the attached PDF. This is based on all the standard scripts, with some special cases ignored (like the control framework). I'm not fully sure yet what to conclude from this, but a few quick observations: - The main topics are bro/cluster/ and bro/cluster/node/. For these we wouldn't have a problem with loops if we enabled automatic, topic-driven forwading as far as I can see. - bro/cluster/broadcast seems to be the main case with a looping problem, because everybody subscribes to it. It's hardly used though. (bro/config/change is used similarly though). - Relaying is hardly used. - There are a couple of script-specific topics where I'm wondering if these could switch to using bro/cluster/ instead (bro/intel/*, bro/irc/dcc_transfer_update). In other words: when clusterizing scripts, prefer not to introduce new topics. - There's a lot of checks in publishing code of the type "if I am (not) of node type X". - Pools are used for two different things: 1. the known-* scripts pick a proxy to process and log the information; whereas 2. the Intel scripts pick a proxy just as a relay to broadcast stuff out, reducing load. That 1st application is a good, but the 2nd feels like should be handled differently. Need to mull over this more, thoughts welcome. Overall I have to say I found it pretty hard to follow this all because we don't have much consistency right now in how scripts structure their communication. That's not surprising, given that we're just starting to use all this, but it suggests that we have room for improvement in our abstractions. :) Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com -------------- next part -------------- A non-text attachment was scrubbed... Name: Broker Communication.pdf Type: application/pdf Size: 32669 bytes Desc: not available Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180806/5a0c99fa/attachment-0001.pdf From jan.grashoefer at gmail.com Tue Aug 7 03:05:53 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Tue, 7 Aug 2018 12:05:53 +0200 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180806195011.GA10971@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> Message-ID: To be honest, I have somehow lost track of the discussion. What I can recall, it's about simplifying the API in the light of multi-hop routing, which is not fully functional yet. Regarding multi-hop routing I am even not sure what the actual goal is that we are currently aiming at. However, from a conceptual perspective I think "routing" either needs routing algorithms or strict conventions of how the network, to route messages through, is structured. So, what would a "deep cluster" look like and what kind of message flows do we expect in there? Some comments on the observations: On 06/08/18 21:50, Robin Sommer wrote: > - The main topics are bro/cluster/ and > bro/cluster/node/. For these we wouldn't have a problem > with loops if we enabled automatic, topic-driven forwading as > far as I can see. How does forwarding work if I add another node type? Do we assume a certain cluster structure here? If yes: Is that a valid assumption? > - bro/cluster/broadcast seems to be the main case with a looping > problem, because everybody subscribes to it. It's hardly used > though. (bro/config/change is used similarly though). The topic-concept is a multicast scheme, isn't it? Having a broadcast functionality on top of that feels odd. However, it's limited to the cluster topic. This leads me to the question which domains do we operate on? If I think of messages, I start to think about a cluster but that might be only one domain of application. I think it would be good to define layers of abstraction more precise here. > - There are a couple of script-specific topics where I'm wondering > if these could switch to using bro/cluster/ instead > (bro/intel/*, bro/irc/dcc_transfer_update). In other words: when > clusterizing scripts, prefer not to introduce new topics. From my understanding this would mean going back to the old communication patterns. What's the point of having topics if we don't use them? > - There's a lot of checks in publishing code of the type "if I am > (not) of node type X". That's something I would have expected. I don't think this is necessarily an indicator of bad design. Having these kind of checks means that roles are somehow fixed and responsibilities are explicitly codified. > - Pools are used for two different things: 1. the known-* scripts > pick a proxy to process and log the information; whereas 2. the > Intel scripts pick a proxy just as a relay to broadcast stuff > out, reducing load. That 1st application is a good, but the 2nd > feels like should be handled differently. I think we should be careful about introducing too much abstractions. Communication patterns tend to be complex and the more of the complexity is hidden, the easier it will be to generate misunderstandings. For example, in case of the intel framework, proxy nodes might be able to implement some more logic than just relaying at some point. Having the relay abstraction would mean to deal with two different levels of abstractions regarding intel on proxy nodes in this case. > Overall I have to say I found it pretty hard to follow this all > because we don't have much consistency right now in how scripts > structure their communication. That's not surprising, given that we're > just starting to use all this, but it suggests that we have room for > improvement in our abstractions. :) I totally agree here! I think it could help to come up with some more use cases to identify the best abstractions. Jan From jsiwek at corelight.com Tue Aug 7 11:32:25 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 7 Aug 2018 13:32:25 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180806185732.GA9621@corelight.com> References: <20180727173929.GA6404@corelight.com> <20180803172134.GA56790@corelight.com> <20180806185732.GA9621@corelight.com> Message-ID: On Mon, Aug 6, 2018 at 1:57 PM Robin Sommer wrote: > I have another question about this specific case: we use relay_rr() > only for sending Intel::insert_indicator. Intel::remove_indicator gets > published normally through auto_publish(). Why the difference? Potentially no reason other than no one reviewed whether it had potential to be optimized in a similar way. e.g. I first ported scripts in a direct fashion without trying to change too much structurally about comm. patterns or doing any optimization except in cases where a change was specifically talked about. I only recall Justin had called out Intel::insert_indicator, so it got changed. - Jon From jsiwek at corelight.com Tue Aug 7 11:39:03 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 7 Aug 2018 13:39:03 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180806195011.GA10971@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> Message-ID: On Mon, Aug 6, 2018 at 3:00 PM Robin Sommer wrote: > Overall I have to say I found it pretty hard to follow this all > because we don't have much consistency right now in how scripts > structure their communication. That's not surprising, given that we're > just starting to use all this, but it suggests that we have room for > improvement in our abstractions. :) How much is due to new API usage and how much is due to things mainly being a direct port of old communication patterns (which I guess are written by various people over extended lengths of time and so there's inconsistencies to be expected) ? Or due to being a mishmash of both old and new? - Jon From jmellander at lbl.gov Tue Aug 7 15:15:27 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Tue, 7 Aug 2018 15:15:27 -0700 Subject: [Bro-Dev] Writing SumStats plugin In-Reply-To: References: Message-ID: It seems that there's some inconsistency in SumStats plugin usage and implementation. There appear to be 2 classes of plugins with differing calling mechanisms and action: 1. Item to be measured is in the Key, and the measurement is in Observation 1. These include Average, Last X Observations, Max, Min, Sample, Standard Deviation, Sum, Unique, Variance 1. These are exact measurements. 2. Some of these have dependencies: StdDev depends on Variance, which depends on Average 2. Item to be measured is in Observation, and the measurement is implicitly 1, and the Key is generally null 1. These include HyperLogLog (number of Unique), TopK (top count) 1. These are probabilistic data structures. The Key is not passed to the plugin, but is used to allocate a table that includes, among other things, the processed observations. Both classes call the epoch_result function once per key at the end of the epoch. Since class 2 plugins often use a null key, there is only one call to epoch_result, and a special function is used to extract the results from the probabilistic data structure ( https://www.bro.org/current/exercises/sumstats/sumstats-5.bro). The epoch_finished function is called when all keys have been returned to finish up. This is unneeded with this sort of class 2 plugin, since all the work can be done in the sole call to epoch_result. Multiple keys could be used with class 2 plugins, which allows for groupings ( https://www.bro.org/current/exercises/sumstats/sumstats-4.bro). I have a use case where I want to pass both a key and measurement to a plugin maintaining a probabilistic data store [1]. I don't want to allocate a table for each key, since many/most will not be reflected in the final results. Since the Observation is a record containing both a string & a number, a hack would be to coerce the key to a string, and pass both in the Observation to a class 2 plugin, with a null key - which is what I am doing currently. It would be nice to have a conversation on how to unify these two classes of plugins. A few thoughts on this: - Pass Key to the plugins - maybe Key could be added to the Observation structure. - Provide a mechanism to *not* allocate the table structure with every new Key (this and the previous can possibly be done with some hackiness with the normalize_key function in the reducer record) - Some sort of epoch_result factory function that by default just performs the class 1 plugin behavior. For class 2 plugins, the function would feed the results one by one into epoch_result. Incidentally, I think theres a bug in the observe() function: These two lines are run in the loop thru the reducers: if ( r?$normalize_key ) key = r$normalize_key(copy(key)); which has the effect of modifying the key for subsequent loops, rather than just for the one reducer it applies to. The fix is easy and and obvious.... Jim [1] Implementation of algorithms 4&5 (with enhancements) of https://arxiv.org/pdf/1705.07001.pdf On Thu, Aug 2, 2018 at 4:44 PM, Jim Mellander wrote: > Hi all: > > I'm thinking of writing a SumStats plugin, probably with the initial > implementation in bro scriptland, with a re-implementation as BIFs if > initial tests successful. > > From examining several plugins, it appears that I need to: > > - Add NAME of my plugin as an enum to Calculation > - Add optional tunables to Reducer > - Add my data structure to ResultVal > - In register_observe_plugins, register the function to take an > observation. > - In init_result_val_hook, add code to initialize data structure. > - In compose_resultvals_hook, add code to merge multiple data > structures > - Create function to extract > from data structure either at epoch_result, or epoch_finished > > Any thing else I should be aware of? > > Thanks in advance, > > Jim > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180807/88a2cedd/attachment.html From jazoff at illinois.edu Wed Aug 8 07:20:31 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Wed, 8 Aug 2018 14:20:31 +0000 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180806195011.GA10971@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> Message-ID: <83D39DB4-98E2-463F-B435-F09B7149F73C@illinois.edu> > On Aug 6, 2018, at 3:50 PM, Robin Sommer wrote: > > - Relaying is hardly used. > > > - There's a lot of checks in publishing code of the type "if I am > (not) of node type X". I think these 2 are somewhat related. Since there weren't higher level things like relaying, in order to relay a message from one worker to all other workers you had to jump through hoops with worker2manger and manager2worker events and often lots of @if stuff. There's also a bunch of places that I think were written standalone first and then updated to work on a cluster in place resulting in some awkwardness.. like notice/main.bro: function NOTICE(n: Notice::Info) { if ( Notice::is_being_suppressed(n) ) return; @if ( Cluster::is_enabled() ) if ( Cluster::local_node_type() == Cluster::MANAGER ) Notice::internal_NOTICE(n); else { n$peer_name = n$peer_descr = Cluster::node; Broker::publish(Cluster::manager_topic, Notice::cluster_notice, n); } @else Notice::internal_NOTICE(n); @endif } event Notice::cluster_notice(n: Notice::Info) { NOTICE(n); } So on a worker, calling NOTICE publishes a cluster_notice event that then re-calls NOTICE on the manager, which then does the right thing. You end up with a single small function with nested @if/if that works 3 different ways. But if this was written in a more 'cluster by default' way, it would just look like: function NOTICE(n: Notice::Info) { if ( Notice::is_being_suppressed(n) ) return; n$peer_name = n$peer_descr = Cluster::node; Broker::publish(Cluster::manager_topic, Notice::cluster_notice, n); } event Notice::cluster_notice(n: Notice::Info) { if ( Notice::is_being_suppressed(n) ) return; Notice::internal_NOTICE(n); } Which other than the suppression check, has no branching at all. Broker::publish could possibly be optimized for standalone to raise the event directly if not being ran in a cluster. The only small downside is on a standalone you'd call is_being_suppressed twice, could always add a @if if you really wanted, but is_being_suppressed is just a set lookup. Then this stuff would be a good use for efficient relaying/broadcasting instead of making the manager do all the work: Broker::auto_publish(Cluster::worker_topic, Notice::begin_suppression); Broker::auto_publish(Cluster::proxy_topic, Notice::begin_suppression); ? Justin Azoff From robin at corelight.com Wed Aug 8 08:48:15 2018 From: robin at corelight.com (Robin Sommer) Date: Wed, 8 Aug 2018 08:48:15 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> Message-ID: <20180808154815.GA17912@corelight.com> On Tue, Aug 07, 2018 at 12:05 +0200, Jan Grash?fer wrote: > What I can recall, it's about simplifying the API in the light of > multi-hop routing, which is not fully functional yet. To level up a bit, what I'm hoping for is that we can find some easy ways to simplify the API a bit more now, with an eye towards dynamic multi-hop coming later. I don't know if it'll work out before 2.6 still, but changing the API later is more painful. We don't need to (or even can) solve multi-hop topologies right now, I think nobody really has the use cases clear in their heads yet. But if we could simplify the API a bit more for our current use cases in a way that may extend to multihop naturally later, that would probably save us some headaches at that point. > How does forwarding work if I add another node type? That's actually something I realized yesterday: we don't have direct worker-to-worker communication right now, correct? A worker cannot just publish to "bro/cluster/workers". > Do we assume a certain cluster structure here? If yes: Is that a valid > assumption? I think it's safe to assume we have the cluster structure under our own control; it's whatever we configure it to be. That's something that's easier to change later than the API itself. Said differently: we can always adjust the connections and topics that we set up by default; it's much harder to change how the publish() function works. > From my understanding this would mean going back to the old > communication patterns. What's the point of having topics if we don't > use them? Let me try to phrase it differently: If there's already a topic for a use case, it's better to use it. That's easier and less error-prone. So if, e.g., I want to send my script's data to all workers, publishing to bro/cluster/worker will do the job. And that will even automatically adapt if things get more complex later. For example, I can see having multiple otherwise independent cluster sharing a communication channel. In that case, we could internally change the topic to "bro/cluster//workers", and everybody using the predefined worker topic would still reach "their" workers without any further changes. > That's something I would have expected. I don't think this is > necessarily an indicator of bad design. Maybe it's a *necessary* design, but that doesn't make it nice. ;-) It makes it very hard to follow the logic; when reading through the scripts I got lost multiple times because some "@if I-am-a-manager" was somewhere half a page earlier, disabling the code I was currently looking at for most nodes. We probably can't totally avoid that, but the less the better. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Wed Aug 8 08:53:37 2018 From: robin at corelight.com (Robin Sommer) Date: Wed, 8 Aug 2018 08:53:37 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> Message-ID: <20180808155337.GB17912@corelight.com> Yeah, I realize that. A direct port of the old logic was of course the goal so far, with the drawbacks of that approach accepted & understood. That's what's in place now; that's great and exactly as planned. We can get 2.6 out this way, and it'll be fine. My point is that now also seems like a good time to take stock of what we got that way. That direct porting is finally getting us some sense of where things aren't an ideal match between API and use cases yet. And if there's something easy we can do about that before people start relying on the new API, it seems that would be beneficial to do. But we can see. Robin On Tue, Aug 07, 2018 at 13:39 -0500, Jonathan Siwek wrote: > How much is due to new API usage and how much is due to things mainly > being a direct port of old communication patterns (which I guess are > written by various people over extended lengths of time and so there's > inconsistencies to be expected) ? Or due to being a mishmash of both > old and new? -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Wed Aug 8 08:54:27 2018 From: robin at corelight.com (Robin Sommer) Date: Wed, 8 Aug 2018 08:54:27 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <83D39DB4-98E2-463F-B435-F09B7149F73C@illinois.edu> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <83D39DB4-98E2-463F-B435-F09B7149F73C@illinois.edu> Message-ID: <20180808155427.GC17912@corelight.com> On Wed, Aug 08, 2018 at 14:20 +0000, Justin Azoff wrote: > There's also a bunch of places that I think were written standalone first and then updated to work on a cluster in > place resulting in some awkwardness.. Yeah, indeed, that's another other source of complexity with these scripts. > But if this was written in a more 'cluster by default' way, it would just look like: Nice example. That's the kind of thing I hope we can do during the next cycle: streamline the scripts to unify these kinds of logic. > Broker::publish could possibly be optimized for standalone to raise the event directly if not being ran in a cluster. Or we generally raise published events locally as well if the node is subscribed to the destination topic. There are pros and cons for that I think. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Wed Aug 8 10:36:27 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 8 Aug 2018 12:36:27 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180808155337.GB17912@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> Message-ID: On Wed, Aug 8, 2018 at 10:53 AM Robin Sommer wrote: > > Yeah, I realize that. A direct port of the old logic was of course the > goal so far, with the drawbacks of that approach accepted & > understood. That's what's in place now; that's great and exactly as > planned. We can get 2.6 out this way, and it'll be fine. I'm earnestly probing to try to get a better decomposition of the issues that make it hard to understand cluster communication patterns. There's the exercise of trying to answer "what *is* this script doing?" and then there's also trying to answer "what *should* it be doing?". I seldom felt like I had definitive answers for the later, but I can see how it would be beneficial to do that and also broader script/framework makeovers, possibly before 2.6, because it would help inform whether new APIs are catering to "good" use-cases. Though my thinking is it's not critical to get a 100% API/use-case match off the bat and that there's some actionable stuff to take away from this thread that is at least going to have us heading in a better direction sooner rather than later... > My point is that now also seems like a good time to take stock of what > we got that way. That direct porting is finally getting us some sense > of where things aren't an ideal match between API and use cases yet. > And if there's something easy we can do about that before people start > relying on the new API, it seems that would be beneficial to do. But > we can see. Yeah, agreed. What I've taken away from your earlier points is that these smaller changes are seeming like they'd be beneficial to do before 2.6: * publish() API simplifications/compressions (pending decision on exactly what those should be) * enable message forwarding by default (meaning re-implement the one or two subscription patterns that might create a cycle) * see if any script-specific topics can instead use a pre-existing "cluster" topic What do you think? A separate question/idea I just had was whether how much of the process of auditing the subscriptions and communication patterns was difficult due to having to hunt down things in various scripts and whether a more centralized config could be something to do? e.g. I don't know how the details would work out, but I'm imagining a workflow where one edits a centralized config file with subscription/node info in it and that auto-generates the code to set them up. Sort of like working backward from the info in the PDF you shared. - Jon From jsiwek at corelight.com Wed Aug 8 11:06:40 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 8 Aug 2018 13:06:40 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180808154815.GA17912@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808154815.GA17912@corelight.com> Message-ID: On Wed, Aug 8, 2018 at 10:55 AM Robin Sommer wrote: > That's actually something I realized yesterday: we don't have direct > worker-to-worker communication right now, correct? A worker cannot > just publish to "bro/cluster/workers". Right, here's a crude graphic of the cluster layout from the docs: https://github.com/bro/bro/blob/master/doc/frameworks/broker/cluster-layout.png - Jon From robin at corelight.com Wed Aug 8 12:50:18 2018 From: robin at corelight.com (Robin Sommer) Date: Wed, 8 Aug 2018 12:50:18 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> Message-ID: <20180808195018.GR40264@corelight.com> On Wed, Aug 08, 2018 at 12:36 -0500, Jonathan Siwek wrote: > * publish() API simplifications/compressions (pending decision on > exactly what those should be) Yeah, with an eye on the semantics for forwarding (now and later), and whether to raise published events locally as well if the host is subscribed itself. And maybe the 2nd eye on: can define these semantics so that we can get rid of some of the "what node type am I?" checks? I'm not sure how that would look like, but generally it would be nice if one could just publish stuff liberally without worrying too much and the subscriptions and forwarding semantics do the right thing (not always, but often)). > * enable message forwarding by default (meaning re-implement the one > or two subscription patterns that might create a cycle) Haven't quite made up my mind on this one. In principlel yes, but right now a host needs to be subscribed to a topic to forward it if I remember than right. That may limit how we use topics, not sure (e.g., if a worker wanted to talk to other workers, with "real" forwarding/routing they'd just publish to the worker topic and that message would get routed there, but not be processed at the intermediary hops as well. With our current forwarding, the hops would need to subscribe to the worker topic as well and hence the event got raised there, too.) > * see if any script-specific topics can instead use a pre-existing > "cluster" topic Yep. > difficult due to having to hunt down things in various scripts and > whether a more centralized config could be something to do? Yeah, that sounds useful for the cluster case: it could be part of the cluster framework to define all the relevant node types with their characeristics. That would also make later changes easier & centralized to how topics and connections are set up. For other use cases, it should still be possible to configure things independently, too, though (say, for talking to external Broker applications). Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Thu Aug 9 08:02:02 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Thu, 9 Aug 2018 10:02:02 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180808195018.GR40264@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> Message-ID: On Wed, Aug 8, 2018 at 2:50 PM Robin Sommer wrote: > > * enable message forwarding by default (meaning re-implement the one > > or two subscription patterns that might create a cycle) > > Haven't quite made up my mind on this one. In principlel yes, but > right now a host needs to be subscribed to a topic to forward it if I > remember than right. That may limit how we use topics, not sure (e.g., > if a worker wanted to talk to other workers, with "real" > forwarding/routing they'd just publish to the worker topic and that > message would get routed there, but not be processed at the > intermediary hops as well. With our current forwarding, the hops would > need to subscribe to the worker topic as well and hence the event got > raised there, too.) Yeah, that's how I also understand the current mechanisms would work. Maybe can split it into two separate questions: (1) enable the "explicit/manual" forwarding by default? (2) re-implement any existing subscription cycles? Answer to (2) may pragmatically be "yes" because they'd be known to cause problems if ever (1) did become enabled (and also could be problematic for a more sophisticated/automatic/implicit routing system should that become available in the future... at least I think it's a problem, but then again maybe connection-cycles would also still be a problem at that point, not quite sure). Answer to (1) may be "no" because we don't have a use for it at the moment -- having the forwarding-nodes also raise events is not ideal, but if we solved that would it be useful? Maybe an idea would be extend the subscribe() API in Bro: function Broker::subscribe(topic_prefix: string, forward_only: bool &default=F); I recall that we have access to both the message/event as well as the topic string on the receiver side, so could be possible to detect whether or not to raise the event depending on whether the topic only has a matching subscription prefix that is marked as forward_only. With that you could do something like: # On Manager Broker::subscribe(worker_to_worker_topic, T); # On Worker Broker::subscribe(worker_to_worker_topic); Broker::publish(worker_to_worker_topic, my_event); There, my_event would be distributed from one worker to all workers via the manager, but not sure that's as usable/dynamic as the current "relay" mechanism because you also get a load-balancing scheme to go along with it. Here, you'd only ever want to pick a single manager or proxy to do the forwarding (subscribing like this on all proxies causes all proxies to forward to all workers resulting in undesired event duplication.) So I guess that's still to say I'm not sure what the use of the current forwarding mechanism would be if it were enabled. Also maybe begs the question for later regarding the "real" routing mechanism: I suppose that would need to be smart enough to do automatic load-balancing in the case of there being more than one route to a subscriber. - Jon From robin at corelight.com Thu Aug 9 11:29:38 2018 From: robin at corelight.com (Robin Sommer) Date: Thu, 9 Aug 2018 11:29:38 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> Message-ID: <20180809182938.GB55342@corelight.com> Yeah, and let me add one thing: What if as a starting point for modeling things, we assumed that we have global topic-based routing available. Meaning if node A publishes to topic X, the message will show up at all nodes that are subscribed to topic X anywhere, no matter what the topology --- Broker will somehow take care of that. I believe that's where we want to get eventually, through whatever mechanism; it's not trivial, but also not rocket science. Then we (A) design the API from that perspective and adapt our standard scripts accoordingly, and (B) see how we can get an approximation of that assumption for today's Broker and our simple clusters, by having the cluster framework hardcode what need. > (1) enable the "explicit/manual" forwarding by default? Coming from that assumption above, I'd say yes here, doing it like you suggest: differentiate between forwarding and locally raising an event by topic. Maybe instead of adding it to Broker::subscribe() as a boolean, we add a separate "Broker::forward(topic_prefix)" function, and use that to essentially hardcode forwarding on each node just like we want/need for the cluster. Behind the scenes Broker could still just store the information as a boolean, but API-wise it means we can later (once we have real routing) just rip out the forward() calls and let Magic take its role. :) As you say, we don't get load-balancing that way (today), but we still have pools for distributing analyses (like the known-* scripts do). And if distributing message load (like the Intel scripts do) is necessary, I think pools can solve that as well: we could use a RR proxy pool and funnel it through script-land there: send to one proxy and have an event handler there that triggers a new event to publish it back out to the workers. For proxies, that kind of additional load should be fine (if load-balancing is even necessary at all; just going through a single forwarding node might just as well be fine. > (2) re-implement any existing subscription cycles? Now, here I'm starting to change my mind a bit. Maybe in the end, in large topologies, it would be futile to insist on not having cycles after all. The assumption above doesn't care about it, putting Broker in charge of figuring it out. So with that, if we can set up forwarding through (1) in a way that cycles in subscriptions don't matter, it may be fine to just leave them in. But I guess in the end it doesn't matter, removing them can only make things better/easier. > Also maybe begs the question for later regarding the "real" routing > mechanism: I suppose that would need to be smart enough to do > automatic load-balancing in the case of there being more than one > route to a subscriber. Yeah, I'm becoming more and more convinced that in the end we won't get around adding a "real" routing layer that takes of such things under the hood. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From vlad at es.net Fri Aug 10 05:40:24 2018 From: vlad at es.net (Vlad Grigorescu) Date: Fri, 10 Aug 2018 12:40:24 +0000 Subject: [Bro-Dev] DHCP event removal In-Reply-To: References: <48118758-BE99-43C1-A7EC-730B488DE1EE@corelight.com> <191AF065-64A3-41E5-9AC8-CC39714B450C@illinois.edu> Message-ID: On Fri, Jun 15, 2018 at 9:38 PM, Vlad Grigorescu wrote: > Even if it's not widely used, I think it'd be a nicer user experience if > we were to ship a script that handled dhcp_message, and raised the old > events. We could mark the old events as deprecated, and remove them in the > next version. That way, people have at least one cycle to upgrade. > I have a branch that implements this, topic/vladg/dhcp_event_deprecation. You would need to load policy/protocols/dhcp/deprecated_events.bro. --Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180810/f8cac425/attachment.html From jan.grashoefer at gmail.com Fri Aug 10 06:22:14 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Fri, 10 Aug 2018 15:22:14 +0200 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180808154815.GA17912@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808154815.GA17912@corelight.com> Message-ID: On 08/08/18 17:48, Robin Sommer wrote:> I think it's safe to assume we have the cluster structure under our > own control; it's whatever we configure it to be. That's something > that's easier to change later than the API itself. Said differently: > we can always adjust the connections and topics that we set up by > default; it's much harder to change how the publish() function works. I think in an earlier discussion (could be http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2017-February/012386.html) there was the idea of different types of data nodes that would serve different purposes. If that is still a design goal, it feels like the structure of a cluster could be more volatile than it used to be. Not sure how that fits to the current assumptions. Just wanted to bring that back into the discussion. > Let me try to phrase it differently: If there's already a topic for a > use case, it's better to use it. That's easier and less error-prone. > So if, e.g., I want to send my script's data to all workers, > publishing to bro/cluster/worker will do the job. And that will even > automatically adapt if things get more complex later. Maybe a silly question: Would that work using further "specialized" topics like bro/cluster/worker/intel? From my understanding one feature of topics is that one would be able to subscribe only the the things that one is interested in. Having a bunch of events just published to bro/cluster/worker seems counterproductive. > Maybe it's a *necessary* design, but that doesn't make it nice. ;-) It > makes it very hard to follow the logic; when reading through the > scripts I got lost multiple times because some "@if I-am-a-manager" > was somewhere half a page earlier, disabling the code I was currently > looking at for most nodes. We probably can't totally avoid that, but > the less the better. I agree! One thing that could also help here is clear separation. In the intel framework that kind of code is capsuled in a cluster.bro file, which is basically divided into a worker and a manager part. In the end it's a tradeoff between abstraction and flexibility. Jan From robin at corelight.com Fri Aug 10 08:12:55 2018 From: robin at corelight.com (Robin Sommer) Date: Fri, 10 Aug 2018 08:12:55 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808154815.GA17912@corelight.com> Message-ID: <20180810151254.GD51305@corelight.com> On Fri, Aug 10, 2018 at 15:22 +0200, Jan Grash?fer wrote: > different purposes. If that is still a design goal, it feels like the > structure of a cluster could be more volatile than it used to be. It is, and we have some of that, and I think it fits in with the discussion here too. In my mind, I see two separate things in this discussion: one is a general Broker API that facilitates some very different applications; and the 2nd is our cluster framework that uses that API for a specific use-case. The latter is much easier to tune for us in terms of how it uses Broker, as we can hide much of it internally and adjust later, i.e., by adding a new node type. The question for the cluster framework, then, is what API *it* provides for scripts to share state in a cluster. And a part of the answer to that could be "standardized topics" that are guaranteed to get the information to where it needs to go. > Maybe a silly question: Would that work using further "specialized" topics > like bro/cluster/worker/intel? From my understanding one feature of topics > is that one would be able to subscribe only the the things that one is > interested in. Having a bunch of events just published to bro/cluster/worker > seems counterproductive. I hear you, but I think I haven't quite understood the concern yet. Can you give me an example where the difference matters? What's different between publishing intel events to bro/cluster/worker/intel vs bro/cluster/worker if both go to all workers? Or is it so that some workers can decide not to receive the intel events? (And technically, subscriptions are prefixed based, so anybody subscribing to bro/cluster/worker automatically gets bro/cluster/worker/intel as well; not sure if that helps or hurts here?) Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Fri Aug 10 08:24:00 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 10 Aug 2018 10:24:00 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180809182938.GB55342@corelight.com> References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> Message-ID: On Thu, Aug 9, 2018 at 1:29 PM Robin Sommer wrote: > > (1) enable the "explicit/manual" forwarding by default? > > Coming from that assumption above, I'd say yes here, doing it like you > suggest: differentiate between forwarding and locally raising an event > by topic. Maybe instead of adding it to Broker::subscribe() as a > boolean, we add a separate "Broker::forward(topic_prefix)" function, > and use that to essentially hardcode forwarding on each node just like > we want/need for the cluster. Behind the scenes Broker could still > just store the information as a boolean, but API-wise it means we can > later (once we have real routing) just rip out the forward() calls and > let Magic take its role. :) Not sure there'd be anywhere we'd currently use Broker::forward() ? Or is it a matter of "if a user needed it for something, then it's available" ? The only intra-cluster communication that's more than 1 hop at the moment is worker-worker, but setting up a Broker::forward() route wouldn't be my first thought as it's not currently a scalable approach. I'd instead take the cautious approach of relaying via a RR-proxy so one can add proxies to handle more load as needed. However, I can see Broker::forward() could make it a bit easier for a user wanting to manually set up a forwarding route between clusters or other external applications. Is that a clear use-case we need to cater to now? If so, then it would indeed be just saying "hey, Broker::forward() is now a no-op since Broker has real routing mechanisms and you can remove them". > As you say, we don't get load-balancing that way (today), but we still > have pools for distributing analyses (like the known-* scripts do). > And if distributing message load (like the Intel scripts do) is > necessary, I think pools can solve that as well: we could use a RR > proxy pool and funnel it through script-land there: send to one proxy > and have an event handler there that triggers a new event to publish > it back out to the workers. For proxies, that kind of additional load > should be fine (if load-balancing is even necessary at all; just going > through a single forwarding node might just as well be fine. Seems more prudent not to guess whether a single, hardcoded forwarding node is good enough when writing the default cluster-enabled scripts. RR via proxy is not just load-balancing either, but fault-tolerance as well. But here you're talking more about removing the relay() functions and doing the RR-via-proxy "manually", right? That seems ok to me -- once "real" routing is available, you then have the option to simplify your script and get a minor optimization by not having to manually handle+forward the event on proxies. > > (2) re-implement any existing subscription cycles? > > Now, here I'm starting to change my mind a bit. Maybe in the end, in > large topologies, it would be futile to insist on not having cycles > after all. The assumption above doesn't care about it, putting Broker > in charge of figuring it out. So with that, if we can set up > forwarding through (1) in a way that cycles in subscriptions don't > matter, it may be fine to just leave them in. But I guess in the end > it doesn't matter, removing them can only make things better/easier. Again, I think we wouldn't have any Broker::forward() usages in the default cluster setup, but simply enabling the forwarding of messages at the Broker-layer would currently cause some messages to route in a cycle. Enabling the current message forwarding means we need to re-implement existing subscription cycles. If we instead waited for the "real" routing, then it doesn't matter if we leave them in. - Jon From jsiwek at corelight.com Fri Aug 10 08:52:32 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 10 Aug 2018 10:52:32 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808154815.GA17912@corelight.com> Message-ID: On Fri, Aug 10, 2018 at 8:29 AM Jan Grash?fer wrote: > > Let me try to phrase it differently: If there's already a topic for a > > use case, it's better to use it. That's easier and less error-prone. > > So if, e.g., I want to send my script's data to all workers, > > publishing to bro/cluster/worker will do the job. And that will even > > automatically adapt if things get more complex later. > > Maybe a silly question: Would that work using further "specialized" > topics like bro/cluster/worker/intel? From my understanding one feature > of topics is that one would be able to subscribe only the the things > that one is interested in. Having a bunch of events just published to > bro/cluster/worker seems counterproductive. Yeah, topic use-cases may need clarification. There's one desire to use topics as a way to specify known destination(s) within a cluster. Another desire could be using the topic name to hierarchically summarize/describe a quality of the message content in order to share with the external world. Maybe the thing that's currently unclear is what the intended borders are for information sharing? I break it down like: (1) if the event you're publishing just facilitates scalable cluster analysis: you'd tend to use the topic names which target node classes within a cluster (eventually this might be "bro//worker") (2) if the event you're publishing is intended for external consumption, then you should use a topic which describes some specific qualities of the message (e.g. "jan/intel") Events that fall under (1) don't need to be descriptive since we don't want to encourage people to arbitrarily start subscribing to events that act as the details for how cluster analysis is implemented. Or I guess if they do subscribe, then they are the kind of person that's more interested in inspecting the cluster's performance/communication characteristics anyway. I'd also say that (2) is a user decision -- they need to be the one to decide if their cluster has produced some bit of information worthy of sharing to the external world and then publish it under a suitable topic name. That make sense? - Jon From robin at corelight.com Fri Aug 10 08:55:49 2018 From: robin at corelight.com (Robin Sommer) Date: Fri, 10 Aug 2018 08:55:49 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> Message-ID: <20180810155549.GG51305@corelight.com> On Fri, Aug 10, 2018 at 10:24 -0500, Jonathan Siwek wrote: > Or is it a matter of "if a user needed it for something, then it's > available" ? Yeah, including matching expectations: if there's a "bro/cluster/worker" topic, I'd expect I can publish there to reach all the workers (from anywhere). However, I think I'm with you now that maybe we just shouldn't do do/support any forwarding in the cluster right now. Pools and manual relaying are a (currently better) alternative, and we can change things later. And at least it's a clear message: no forwarding across cluster nodes. > However, I can see Broker::forward() could make it a bit easier for a > user wanting to manually set up a forwarding route between clusters or > other external applications. Is that a clear use-case we need to > cater to now? Well, if it were easy to add the forward() function, that could indeed be quite useful for external integrations still. With that, one could selectively forward custom topics (at one's own risk), without causing a mess for the cluster. I'm thinking osquery integration for example, where messages might go through an intermediary Bro. One advantage that Broker-internal forwarding has compared to manual relaying is that messages won't be propagated back to the sender. But it's a matter of effort at this point I'd say. > RR via proxy is not just load-balancing either, but fault-tolerance as > well. Yeah, that's right. > But here you're talking more about removing the relay() functions and > doing the RR-via-proxy "manually", right? That seems ok to me -- once > "real" routing is available, you then have the option to simplify your > script and get a minor optimization by not having to manually > handle+forward the event on proxies. Ok, let's make that change then, I think removing relay() will help for sure making the API easier. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jazoff at illinois.edu Fri Aug 10 09:47:12 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Fri, 10 Aug 2018 16:47:12 +0000 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180810155549.GG51305@corelight.com> References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> Message-ID: <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> > On Aug 10, 2018, at 11:55 AM, Robin Sommer wrote: > > > Ok, let's make that change then, I think removing relay() will help > for sure making the API easier. If relay is removed how does a script writer efficiently get an event from one worker (or manager) to all of the other workers? ? Justin Azoff From jan.grashoefer at gmail.com Mon Aug 13 06:09:42 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Mon, 13 Aug 2018 15:09:42 +0200 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808154815.GA17912@corelight.com> Message-ID: On 10/08/18 17:12, Robin Sommer wrote: > I hear you, but I think I haven't quite understood the concern yet. > Can you give me an example where the difference matters? What's > different between publishing intel events to bro/cluster/worker/intel > vs bro/cluster/worker if both go to all workers? Or is it so that some > workers can decide not to receive the intel events? The use case I had in my mind is an external application that is interested in interfacing with the intelligence framework. Either for querying it similar to workers of for managing purposes. If possible, it could be beneficial for such an application to receive only the relevant parts of cluster communication. On 10/08/18 17:52, Jon Siwek wrote: > (1) if the event you're publishing just facilitates scalable cluster > analysis: you'd tend to use the topic names which target node classes > within a cluster (eventually this might be "bro//worker") > > (2) if the event you're publishing is intended for external > consumption, then you should use a topic which describes some specific > qualities of the message (e.g. "jan/intel") The case described above seems to be both. On the one hand the primary use case is internal cluster communication. On the other hand it feels quite natural to dock here for external applications. Another (debatable) use case might be directly interfacing the configuration framework, skipping the configuration file layer. Jan From jsiwek at corelight.com Mon Aug 13 09:24:37 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 13 Aug 2018 11:24:37 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> Message-ID: On Fri, Aug 10, 2018 at 11:47 AM Azoff, Justin S wrote: > If relay is removed how does a script writer efficiently get an event from one worker (or manager) > to all of the other workers? Old Worker: Cluster::relay_rr(Cluster::proxy_pool, my_event); New Worker: Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event); New Proxy: event my_event() { Broker::publish(Cluster::worker_topic, my_event); } So the proxy has additional overhead of the proxy's event handler. I doubt that's much a problem from the "efficiency" standpoint, but if it were, then just having more proxies helps. Once real routing were available the code would still work or you could opt to change to just: Even Newer Worker: Broker::publish(Cluster::worker_topic, my_event); See any problems there? - Jon From jsiwek at corelight.com Mon Aug 13 09:33:03 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 13 Aug 2018 11:33:03 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References:

<20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808154815.GA17912@corelight.com>

Message-ID: On Mon, Aug 13, 2018 at 8:09 AM Jan Grash?fer wrote: > > On 10/08/18 17:12, Robin Sommer wrote: > > I hear you, but I think I haven't quite understood the concern yet. > > Can you give me an example where the difference matters? What's > > different between publishing intel events to bro/cluster/worker/intel > > vs bro/cluster/worker if both go to all workers? Or is it so that some > > workers can decide not to receive the intel events? > > The use case I had in my mind is an external application that is > interested in interfacing with the intelligence framework. Either for > querying it similar to workers of for managing purposes. If possible, it > could be beneficial for such an application to receive only the relevant > parts of cluster communication. > > On 10/08/18 17:52, Jon Siwek wrote: > > (1) if the event you're publishing just facilitates scalable cluster > > analysis: you'd tend to use the topic names which target node classes > > within a cluster (eventually this might be "bro//worker") > > > > (2) if the event you're publishing is intended for external > > consumption, then you should use a topic which describes some specific > > qualities of the message (e.g. "jan/intel") > > The case described above seems to be both. On the one hand the primary > use case is internal cluster communication. On the other hand it feels > quite natural to dock here for external applications. Another > (debatable) use case might be directly interfacing the configuration > framework, skipping the configuration file layer. I'm generally thinking there's nothing stopping one from picking a new topic name to re-publish some set of events under. Would that be possible in the case you're imagining? I don't think we're going to come up with a general (or enforce-able) way of picking topic names such that they'll be useful for any arbitrary, external use-case. So we pick the topic name that is best for the use-case we have at time of writing a script (e.g. we just want to get it working on a cluster so use the pre-existing topics that are available for that), and then let others re-publish a subset of events under different topics dependent on their specific use-case. - Jon From jazoff at illinois.edu Mon Aug 13 10:55:54 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Mon, 13 Aug 2018 17:55:54 +0000 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> Message-ID: <7A2E348D-04A9-4E10-9CB5-AC5A627D1F02@illinois.edu> > On Aug 13, 2018, at 12:24 PM, Jon Siwek wrote: > > Even Newer Worker: > > Broker::publish(Cluster::worker_topic, my_event); > > See any problems there? That's nice and simple :-) Assuming that can send the events around in the most efficient way possible, that's perfect. The one tricky case is doing that on the manager. While the manager is fully connected to all workers, you really want to offload the fanning out of messages to one of the proxies. ? Justin Azoff From jsiwek at corelight.com Mon Aug 13 11:55:31 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 13 Aug 2018 13:55:31 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <7A2E348D-04A9-4E10-9CB5-AC5A627D1F02@illinois.edu> References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> <7A2E348D-04A9-4E10-9CB5-AC5A627D1F02@illinois.edu> Message-ID: On Mon, Aug 13, 2018 at 12:56 PM Azoff, Justin S wrote: > > Broker::publish(Cluster::worker_topic, my_event); > The one tricky case is doing that on the manager. While the manager is fully connected to all workers, > you really want to offload the fanning out of messages to one of the proxies. Yeah, I don't know exactly how it would be implemented yet, but seems to warrant a policy/flag that one sets on the manager that means "prefer sending along a 2-hop route rather than a 1-hop route if it minimizes our own workload" or else a way to mark proxy nodes such that any connected peers always prefer to send 1-routable-message to it rather than N-direct-messages. Maybe falls under "load-balancing" of the prospective routing implementation, which I've tracked as requiring these features: * cycle detection/prevention * network-wide subscription knowledge per-node * load-balancing + proxying policies Let me know if I missed any. I have implementation ideas/notes already which basically requires associating node IDs with subscription state and also message state (push node IDs into messages upon receipt before forwarding), but we can maybe discuss and flesh it out in a later design thread once we decide what exactly to do. As for deciding what to do in the near term, seems like we will arrive at agreeing upon: (1) Remove relay(...) functions (2) Reduce unique topic names (use pre-existing cluster topics where possible) (3) Add Broker::forward(topic_prefix) function + enable Broker forwarding An alternative to (3) would be implementing "real" routing in Broker right from the start. No strong opinion there, but seems like it could fall under nice-to-have at this point and, while it would obsolete Broker::forward(), I don't expect that's much effort wasted. Any other ideas? - Jon From jsiwek at corelight.com Mon Aug 13 16:06:52 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 13 Aug 2018 18:06:52 -0500 Subject: [Bro-Dev] Writing SumStats plugin In-Reply-To: References:

Message-ID: On Tue, Aug 7, 2018 at 5:15 PM Jim Mellander wrote: > Incidentally, I think theres a bug in the observe() function: > > These two lines are run in the loop thru the reducers: > if ( r?$normalize_key ) > key = r$normalize_key(copy(key)); > which has the effect of modifying the key for subsequent loops, rather than just for the one reducer it applies to. The fix is easy and and obvious.... Yeah, looked wrong to me also. Fixed via [1] in master branch now. Sorry I don't have much knowledge of the existing sumstats code to drive the other discussion/suggestions forward. - Jon https://github.com/bro/bro/commit/5821c16490e731a68c0efc9c1aaba2d7aec28f48 From robin at corelight.com Tue Aug 14 08:13:15 2018 From: robin at corelight.com (Robin Sommer) Date: Tue, 14 Aug 2018 08:13:15 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References: <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> <7A2E348D-04A9-4E10-9CB5-AC5A627D1F02@illinois.edu> Message-ID: <20180814151315.GK99915@corelight.com> On Mon, Aug 13, 2018 at 13:55 -0500, Jonathan Siwek wrote: > associating node IDs with subscription state and also message state > (push node IDs into messages upon receipt before forwarding), Yeah, that sounds like the right direction. Some reading might be worthwile doing here, there are quite a few papers out there on routing in overlay networks. > (1) Remove relay(...) functions > (2) Reduce unique topic names (use pre-existing cluster topics where possible) > (3) Add Broker::forward(topic_prefix) function + enable Broker forwarding Yes, that sounds good to me, plus whatever that means for "publish()" itself. I like what we have arrived at here. One more question: what about raising published events locally as well if the sending node is subscribed to the topic? I'm kind of torn on that. I don't think we want that as a default, but perhaps as an option, either with the publish() call or, likely better, with the subscribe() call? I can see that being helpful in cases like unifying standalone vs cluster operation; and more generally, for running multiple node types inside the same Bro instance. > An alternative to (3) would be implementing "real" routing in Broker > right from the start. In an ideal world, yes, that would certainly be nice to have. But it's a larger task that I don't think we would be able to finish for 2.6 anymore. So, I'd put that on the list for later. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Tue Aug 14 08:51:28 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 14 Aug 2018 10:51:28 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <20180814151315.GK99915@corelight.com> References: <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> <7A2E348D-04A9-4E10-9CB5-AC5A627D1F02@illinois.edu> <20180814151315.GK99915@corelight.com> Message-ID: On Tue, Aug 14, 2018 at 10:13 AM Robin Sommer wrote: > One more question: what about raising published events locally as well > if the sending node is subscribed to the topic? I'm kind of torn on > that. I don't think we want that as a default, but perhaps as an > option, either with the publish() call or, likely better, with the > subscribe() call? I can see that being helpful in cases like unifying > standalone vs cluster operation; and more generally, for running > multiple node types inside the same Bro instance. Not sure, is Broker::auto_publish() currently able to do the same thing? e.g. if I want an event to be raised locally, I raise it via "event" and it automatically gets published. I can also see the opposite being intuitive: If I told Broker::subscribe() to raise locally, then I get just always use Broker::publish() and not think about the difference between using "event" versus "publish". Would Broker::auto_publish() be removable then? - Jon From robin at corelight.com Tue Aug 14 09:53:06 2018 From: robin at corelight.com (Robin Sommer) Date: Tue, 14 Aug 2018 09:53:06 -0700 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References: <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> <7A2E348D-04A9-4E10-9CB5-AC5A627D1F02@illinois.edu> <20180814151315.GK99915@corelight.com> Message-ID: <20180814165306.GA42002@corelight.com> On Tue, Aug 14, 2018 at 10:51 -0500, Jonathan Siwek wrote: > Not sure, is Broker::auto_publish() currently able to do the same thing? Hmm .. Good point. Scope is different between the two (event vs topic) but the effect is similar in the end. > I can also see the opposite being intuitive: If I told > Broker::subscribe() to raise locally, then I get just always use > Broker::publish() and not think about the difference between using > "event" versus "publish". Would Broker::auto_publish() be removable > then? I would like to say "yes" (because I like the subscribe() approach better than auto_publish() :-), but would that work well with our cluster topics? If we didn't have the event-specific auto_publish(), we would have to turn on local raise for *all* events going to, e.g., bro/cluster/worker. And thinking about it, maybe that's in fact also an argument against my original thinking how this could help unify scripts --- well, unless we'd go with Jan's thought of subtopics (e.g., subscribe("bro/cluster/worker/intel", local_raise=T). Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jan.grashoefer at gmail.com Tue Aug 14 10:09:26 2018 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Tue, 14 Aug 2018 19:09:26 +0200 Subject: [Bro-Dev] Broker::publish API In-Reply-To: References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> Message-ID: <896ff64f-e1b1-86d1-384a-215ef1d0cf58@gmail.com> On 13/08/18 18:24, Jon Siwek wrote: > Old Worker: > > Cluster::relay_rr(Cluster::proxy_pool, my_event); > > New Worker: > > Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event); That doesn't look like an API simplification to me ;D > Even Newer Worker: > > Broker::publish(Cluster::worker_topic, my_event); > > See any problems there? For this case: Would it be easy to setup distinct pools for different tasks? I could imagine a pool of proxies that is used explicitly for intel distribution and one pool used for processing SumStats events. I think we have discussed something like that before. Maybe I am mixing cluster and broker levels again... Jan From jsiwek at corelight.com Tue Aug 14 11:39:28 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 14 Aug 2018 13:39:28 -0500 Subject: [Bro-Dev] Broker::publish API In-Reply-To: <896ff64f-e1b1-86d1-384a-215ef1d0cf58@gmail.com> References: <20180730160101.GE43154@corelight.com> <20180806195011.GA10971@corelight.com> <20180808155337.GB17912@corelight.com> <20180808195018.GR40264@corelight.com> <20180809182938.GB55342@corelight.com> <20180810155549.GG51305@corelight.com> <571A01C0-9F72-4578-8647-ED48181EE756@illinois.edu> <896ff64f-e1b1-86d1-384a-215ef1d0cf58@gmail.com> Message-ID: On Tue, Aug 14, 2018 at 12:09 PM Jan Grash?fer wrote: > > On 13/08/18 18:24, Jon Siwek wrote: > > Old Worker: > > > > Cluster::relay_rr(Cluster::proxy_pool, my_event); > > > > New Worker: > > > > Broker::publish(Cluster::rr_topic(Cluster::proxy_pool), my_event); > > That doesn't look like an API simplification to me ;D The goal here I imagine is rather to avoid releasing a function that we knowingly plan to remove later. A user would have to eventually port all Cluster::relay_rr() calls, but that Broker::publish() pattern remains valid. > > Even Newer Worker: > > > > Broker::publish(Cluster::worker_topic, my_event); > > > > See any problems there? > > For this case: Would it be easy to setup distinct pools for different > tasks? I could imagine a pool of proxies that is used explicitly for > intel distribution and one pool used for processing SumStats events. I > think we have discussed something like that before. Yeah, it would still be possible to define your own pool and use it for your own purposes and it looks similar to the call before: Broker::publish(Cluster::rr_topic(Cluster::my_pool), my_event); A difference in the context of our needs for the cluster communication is that the pool is being used as a means of achieving routing (in a load-balanced fashion) and so the call gets simplified once those mechanisms get built into Broker routing. In your case, you don't need the routing aspect, just the load-balancing provided by the "pool" concept. - Jon From dopheide at es.net Tue Aug 14 20:19:59 2018 From: dopheide at es.net (Michael Dopheide) Date: Tue, 14 Aug 2018 22:19:59 -0500 Subject: [Bro-Dev] reproducible segfault in master branch Message-ID: Content Removed From jsiwek at corelight.com Wed Aug 15 09:18:45 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 15 Aug 2018 11:18:45 -0500 Subject: [Bro-Dev] reproducible segfault in master branch In-Reply-To: References: Message-ID: On Tue, Aug 14, 2018 at 10:26 PM Michael Dopheide wrote: > > Somehow related to Broker stores and/or casting. You'll get a better error message/behavior now using: https://github.com/bro/bro/commit/f336c8c710bdeb41eb0aba88967ee90da24848b2 But ultimately, you likely want to do something like this patch: ``` --- known-hosts-with-dns.bro.orig 2018-08-15 11:07:41.000000000 -0500 +++ known-hosts-with-dns.bro 2018-08-15 10:44:03.000000000 -0500 @@ -113,7 +113,7 @@ for (ip in r$result as addr_set){ when ( local res = Broker::get(Known::host_store$store,ip)){ - if(res?$result){ + if(res$status == Broker::SUCCESS){ @if ( ! Cluster::is_enabled() ) Known::hosts[ip] = fmt("%s",res$result as string); @else ``` As for why some keys no longer exist in those lookups immediately after retrieving the full key set: my guess is they simply expired between those two points in time, but I didn't dig into it. The main point would be to never assume the Broker::get() call succeeds, which was likely your intent, except "res?$result" is always true (another form of checking the data exists would be "res$result?$data"). - Jon From dopheide at es.net Wed Aug 15 09:39:52 2018 From: dopheide at es.net (Michael Dopheide) Date: Wed, 15 Aug 2018 11:39:52 -0500 Subject: [Bro-Dev] reproducible segfault in master branch In-Reply-To: References: Message-ID: Excellent, thanks Jon! -Dop On Wed, Aug 15, 2018 at 11:18 AM, Jon Siwek wrote: > On Tue, Aug 14, 2018 at 10:26 PM Michael Dopheide wrote: > > > > Somehow related to Broker stores and/or casting. > > You'll get a better error message/behavior now using: > > https://github.com/bro/bro/commit/f336c8c710bdeb41eb0aba88967ee90da24848b2 > > But ultimately, you likely want to do something like this patch: > > ``` > --- known-hosts-with-dns.bro.orig 2018-08-15 11:07:41.000000000 -0500 > +++ known-hosts-with-dns.bro 2018-08-15 10:44:03.000000000 -0500 > @@ -113,7 +113,7 @@ > for (ip in r$result as addr_set){ > when ( local res = Broker::get(Known::host_store$store,ip)){ > > - if(res?$result){ > + if(res$status == Broker::SUCCESS){ > @if ( ! Cluster::is_enabled() ) > Known::hosts[ip] = fmt("%s",res$result as string); > @else > ``` > > As for why some keys no longer exist in those lookups immediately > after retrieving the full key set: my guess is they simply expired > between those two points in time, but I didn't dig into it. The main > point would be to never assume the Broker::get() call succeeds, which > was likely your intent, except "res?$result" is always true (another > form of checking the data exists would be "res$result?$data"). > > - Jon > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180815/1ac4f8cb/attachment.html From robin at corelight.com Wed Aug 15 11:28:38 2018 From: robin at corelight.com (Robin Sommer) Date: Wed, 15 Aug 2018 11:28:38 -0700 Subject: [Bro-Dev] [Administrativa] Mailing list archives Message-ID: <20180815182838.GS99915@corelight.com> Quick reminder: Please keep in mind that mails to the Bro mailing lists are archived publically. We had a couple of cases recently where internal information went to a list, ending up in the archive, where it's difficult to remove from. Thanks, Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jmellander at lbl.gov Thu Aug 16 13:40:16 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Thu, 16 Aug 2018 13:40:16 -0700 Subject: [Bro-Dev] Use of 'any' type Message-ID: It would be most convenient if the 'any' type could defer type checking until runtime at the script level. For instance, if both A & B are defined as type 'any', a compile time error "illegal comparison (A < B)" occurs upon encountering a bro statement if (A < B) do_something(); even if the actual values stored in A & B at runtime are integral types for which comparison makes sense. If the decision could be made at runtime (which could then potentially throw an error), a number of useful generic functions could be created at the script level, rather than creating yet-another-bif. A useful yet-another-bif would be 'typeof' to allow varying code paths based on the type of value actually stored in 'any'. Any comments? Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180816/82f81747/attachment.html From jmellander at lbl.gov Thu Aug 16 13:40:17 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Thu, 16 Aug 2018 13:40:17 -0700 Subject: [Bro-Dev] Use of 'any' type Message-ID: It would be most convenient if the 'any' type could defer type checking until runtime at the script level. For instance, if both A & B are defined as type 'any', a compile time error "illegal comparison (A < B)" occurs upon encountering a bro statement if (A < B) do_something(); even if the actual values stored in A & B at runtime are integral types for which comparison makes sense. If the decision could be made at runtime (which could then potentially throw an error), a number of useful generic functions could be created at the script level, rather than creating yet-another-bif. A useful yet-another-bif would be 'typeof' to allow varying code paths based on the type of value actually stored in 'any'. Any comments? Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180816/813da998/attachment.html From johanna at icir.org Thu Aug 16 13:57:55 2018 From: johanna at icir.org (Johanna Amann) Date: Thu, 16 Aug 2018 13:57:55 -0700 Subject: [Bro-Dev] Use of 'any' type In-Reply-To: References: Message-ID: Hi Jim, On 16 Aug 2018, at 13:40, Jim Mellander wrote: > It would be most convenient if the 'any' type could defer type > checking > until runtime at the script level. > > For instance, if both A & B are defined as type 'any', a compile time > error > > "illegal comparison (A < B)" > > occurs upon encountering a bro statement > > if (A < B) do_something(); > > even if the actual values stored in A & B at runtime are integral > types for > which comparison makes sense. I think this is a bit hard to do with how things are set up at the moment internally - and it also does make type-checking at startup less possible-helpful. However... > > If the decision could be made at runtime (which could then potentially > throw an error), a number of useful generic functions could be created > at > the script level, rather than creating yet-another-bif. A useful > yet-another-bif would be 'typeof' to allow varying code paths based on > the > type of value actually stored in 'any'. This already exists and I think you can actually use it to write code like that; you just have to cast your any-type to the correct type first. The function you want is type_name; it is e.g. used in base/utils/json.bro. Johanna From jmellander at lbl.gov Thu Aug 16 14:12:52 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Thu, 16 Aug 2018 14:12:52 -0700 Subject: [Bro-Dev] Use of 'any' type In-Reply-To: References:

Message-ID: Thanks, Johanna - I think type_name() may suffice for the purposes I am envisioning. On Thu, Aug 16, 2018 at 1:57 PM, Johanna Amann wrote: > Hi Jim, > > On 16 Aug 2018, at 13:40, Jim Mellander wrote: > > It would be most convenient if the 'any' type could defer type checking >> until runtime at the script level. >> >> For instance, if both A & B are defined as type 'any', a compile time >> error >> >> "illegal comparison (A < B)" >> >> occurs upon encountering a bro statement >> >> if (A < B) do_something(); >> >> even if the actual values stored in A & B at runtime are integral types >> for >> which comparison makes sense. >> > > I think this is a bit hard to do with how things are set up at the moment > internally - and it also does make type-checking at startup less > possible-helpful. > > However... > > >> If the decision could be made at runtime (which could then potentially >> throw an error), a number of useful generic functions could be created at >> the script level, rather than creating yet-another-bif. A useful >> yet-another-bif would be 'typeof' to allow varying code paths based on the >> type of value actually stored in 'any'. >> > > This already exists and I think you can actually use it to write code like > that; you just have to cast your any-type to the correct type first. The > function you want is type_name; it is e.g. used in base/utils/json.bro. > > Johanna > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180816/fe5100e7/attachment.html From jsiwek at corelight.com Thu Aug 16 14:58:05 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Thu, 16 Aug 2018 16:58:05 -0500 Subject: [Bro-Dev] Use of 'any' type In-Reply-To: References:

Message-ID: In the master branch, there are also type checking/casting 'is' and 'as' operators [1] and type-based switch statement [2] that may be be useful. - Jon [1] https://www.bro.org/sphinx-git/script-reference/operators.html [2] https://www.bro.org/sphinx-git/script-reference/statements.html#keyword-switch On Thu, Aug 16, 2018 at 4:24 PM Jim Mellander wrote: > > Thanks, Johanna - I think type_name() may suffice for the purposes I am envisioning. > > On Thu, Aug 16, 2018 at 1:57 PM, Johanna Amann wrote: >> >> Hi Jim, >> >> On 16 Aug 2018, at 13:40, Jim Mellander wrote: >> >>> It would be most convenient if the 'any' type could defer type checking >>> until runtime at the script level. >>> >>> For instance, if both A & B are defined as type 'any', a compile time error >>> >>> "illegal comparison (A < B)" >>> >>> occurs upon encountering a bro statement >>> >>> if (A < B) do_something(); >>> >>> even if the actual values stored in A & B at runtime are integral types for >>> which comparison makes sense. >> >> >> I think this is a bit hard to do with how things are set up at the moment internally - and it also does make type-checking at startup less possible-helpful. >> >> However... >> >>> >>> If the decision could be made at runtime (which could then potentially >>> throw an error), a number of useful generic functions could be created at >>> the script level, rather than creating yet-another-bif. A useful >>> yet-another-bif would be 'typeof' to allow varying code paths based on the >>> type of value actually stored in 'any'. >> >> >> This already exists and I think you can actually use it to write code like that; you just have to cast your any-type to the correct type first. The function you want is type_name; it is e.g. used in base/utils/json.bro. >> >> Johanna > > > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From jmellander at lbl.gov Thu Aug 16 15:02:46 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Thu, 16 Aug 2018 15:02:46 -0700 Subject: [Bro-Dev] Use of 'any' type In-Reply-To: References:

Message-ID: Actually, the 'as' operator is useful, since it appears that 'any' can currently only be cast into a string otherwise.... On Thu, Aug 16, 2018 at 2:58 PM, Jon Siwek wrote: > In the master branch, there are also type checking/casting 'is' and > 'as' operators [1] and type-based switch statement [2] that may be be > useful. > > - Jon > > [1] https://www.bro.org/sphinx-git/script-reference/operators.html > [2] https://www.bro.org/sphinx-git/script-reference/ > statements.html#keyword-switch > > On Thu, Aug 16, 2018 at 4:24 PM Jim Mellander wrote: > > > > Thanks, Johanna - I think type_name() may suffice for the purposes I am > envisioning. > > > > On Thu, Aug 16, 2018 at 1:57 PM, Johanna Amann wrote: > >> > >> Hi Jim, > >> > >> On 16 Aug 2018, at 13:40, Jim Mellander wrote: > >> > >>> It would be most convenient if the 'any' type could defer type checking > >>> until runtime at the script level. > >>> > >>> For instance, if both A & B are defined as type 'any', a compile time > error > >>> > >>> "illegal comparison (A < B)" > >>> > >>> occurs upon encountering a bro statement > >>> > >>> if (A < B) do_something(); > >>> > >>> even if the actual values stored in A & B at runtime are integral > types for > >>> which comparison makes sense. > >> > >> > >> I think this is a bit hard to do with how things are set up at the > moment internally - and it also does make type-checking at startup less > possible-helpful. > >> > >> However... > >> > >>> > >>> If the decision could be made at runtime (which could then potentially > >>> throw an error), a number of useful generic functions could be created > at > >>> the script level, rather than creating yet-another-bif. A useful > >>> yet-another-bif would be 'typeof' to allow varying code paths based on > the > >>> type of value actually stored in 'any'. > >> > >> > >> This already exists and I think you can actually use it to write code > like that; you just have to cast your any-type to the correct type first. > The function you want is type_name; it is e.g. used in base/utils/json.bro. > >> > >> Johanna > > > > > > _______________________________________________ > > bro-dev mailing list > > bro-dev at bro.org > > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180816/aa84d49a/attachment.html From jmellander at lbl.gov Mon Aug 20 09:20:51 2018 From: jmellander at lbl.gov (Jim Mellander) Date: Mon, 20 Aug 2018 09:20:51 -0700 Subject: [Bro-Dev] Underflow Considered Harmful Message-ID: Or: How 13 billion became 1.844674e+19 before becoming 0. After sending amounts totaling over 13 billion thru Sumstats, a value of 0 came out the other end of the sausage factory, but only for one specific data item. Debugging this required lots of well placed print statements, and wild speculation on my part as to what possibly could be broken.... The values being thrown in to be summed are incremental differences from a previous observation, which *should* be zero or a positive number in the range of several K, so a 'count' variable was used. However, for some reason, this value came up negative (or should have) due to (in decreasing likelihood) logic error in script, one of bro's dark corners, or bro bug. The reason for the negativity is still TBD. But, in the world of unsigned 64-bit values (aka bro 'count' variables) there is no negativity, only positivity, and an unsigned underflow creates a number just below 2**64 ~ 1.844674e+19 .... Well, Sumstats tallies in doubles, and naturally this figure (1.844674e+19) dominated the total. In fact, additional increments to this total pushed the total value to be greater than 2**64 (with loss of precision, as doubles only keep 53 bits). In the processing step at the Sumstats epoch, the value was converted back to a count using the double_to_count() function which the cheatsheet warns returns 0, if the double value is <0.0, but it actually returns 2**64-double (with a runtime error), and for values > 2**64 it returns 0 with no runtime error :-( So, there it is, a value that should have been about 13 billion became 1.844674e+19 and then became 0. A few suggestions: 1. Conversion routines should saturate at respective minima/maxima of the type being converted to (possibly with runtime error). 2. Underflow of the 'count' type is almost invariably a bug, and should trigger a runtime error. Overflow, similarly, although in practice it seems much less likely to occur as most scripts are dealing with integers considerably less than 2**64. A similar argument could be made for 'int'. With some operations, it is difficult to detect overflow/underflow, but for simple add and subtract, it is relatively easy. 3. Documentation to match behavior. Jim -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180820/fa5734e6/attachment.html From dominik.charousset at haw-hamburg.de Tue Aug 21 06:47:12 2018 From: dominik.charousset at haw-hamburg.de (Dominik Charousset) Date: Tue, 21 Aug 2018 15:47:12 +0200 Subject: [Bro-Dev] Broker data layouts Message-ID: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> We are currently writing code for ingesting data directly using Broker?s API. From the docs, it seems that Broker assumes that publishers and subscribers somehow agree on one layout per topic: "senders and receivers will need to agree on a specific data layout for the values exchanged, so that they interpret them in the same way.? [1] This raises a couple of questions. Primarily: where can Broker users learn the layouts to interpret received data? There?s essentially no hope in deferring a layout, since broker::data doesn?t include any meta information such as field names. Is meta information stored somewhere? Is there a convention how to place and retrieve such meta information in Broker?s data stores? How does Bro itself make such information available? Is there a document that lists all topics used by Bro with their respective broker::data layout? Dominik [1] https://bro-broker.readthedocs.io/en/stable/comm.html#exchanging-bro-events -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180821/808b3db5/attachment.html From jsiwek at corelight.com Tue Aug 21 10:34:58 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 21 Aug 2018 12:34:58 -0500 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> Message-ID: On Tue, Aug 21, 2018 at 8:54 AM Dominik Charousset wrote: > This raises a couple of questions. Primarily: where can Broker users learn the layouts to interpret received data? broker/bro.hh is basically all there is right now. e.g. if you construct a broker::bro::Event from a received broker::data, you get access to event name + "interpretable" arguments. > There?s essentially no hope in deferring a layout, since broker::data doesn?t include any meta information such as field names. Is meta information stored somewhere? No, nothing like that is in implicit in message content. > Is there a convention how to place and retrieve such meta information in Broker?s data stores? No, and any stores created by Bro don't even have such meta info. The types stored in them are just documented like "the keys are type Foo and values are type Bar". > How does Bro itself make such information available? Nothing beyond documentation or the Bro -> Broker type mapping that's implicit in events themselves (or as given in docs for data stores). > Is there a document that lists all topics used by Bro with their respective broker::data layout? I don't think there's a plan to keep such an up-to-date document. A basic usage premise that I'm wondering about is that none of the current Broker usage in Bro actually seems suitable for generic/public consumption as it is. It's maybe more implementation details of doing cluster-enabled network traffic analysis, so also not a primary goal to make interpretation of those communications easy for external/direct Broker users. (You can ingest it if you want and do the work of "manually" interpreting it all, but maybe won't be a stable/transparent source of data going forward). However, one can still use Bro + Broker to create their own events/stores in a way that does contain the meta information required for easier/programmatic interpretation on the receiving side. e.g. I think, at the moment, if one is interested in ingesting data produced by Bro, they are best served by explicitly defining topic names, event/data types, and explicitly producing those messages at suitable places within Bro scripts themselves. Then, one can be in control of defining a common/expected data format and include whatever meta information is necessary to help receivers interpret the data. Maybe there's a more standardized approach that could be worked towards, but likely we just need more experience in understanding and defining common use-cases for external Bro data consumption. Or if we were just talking about Broker-only usage independent of Bro, then I think it's still the same ideas/answers: currently up to user to decide how to encode broker::data in a way that defines common/expected layouts + any required meta info. Does that help at all? - Jon From robin at corelight.com Tue Aug 21 11:09:27 2018 From: robin at corelight.com (Robin Sommer) Date: Tue, 21 Aug 2018 11:09:27 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> Message-ID: <20180821180927.GA45660@corelight.com> On Tue, Aug 21, 2018 at 12:34 -0500, Jonathan Siwek wrote: > Maybe there's a more standardized approach that could be worked > towards, but likely we just need more experience in understanding and > defining common use-cases for external Bro data consumption. Dominik, wasn't the original idea for VAST to provide an event description language that would create the link between the values coming over the wire and their interpretation? Such a specification could be auto-generated from Bro's knowledge about the events it generates. Also, this question is about events, not logs, right? Logs have a different wire format and they actually come with meta data describing their columns. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Tue Aug 21 12:05:07 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 21 Aug 2018 14:05:07 -0500 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <20180821180927.GA45660@corelight.com> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> Message-ID: On Tue, Aug 21, 2018 at 1:09 PM Robin Sommer wrote: > Also, this question is about events, not logs, right? Logs have a > different wire format and they actually come with meta data describing > their columns. Though the Broker data corresponding to log entry content is also opaque at the moment (I recall that was maybe for performance or message volume optimization), but I suppose same reasoning as before could apply: this info is internal to Bro operation unless one wants to explicitly re-publish it via their own event for external consumption. - Jon From robin at corelight.com Wed Aug 22 07:54:14 2018 From: robin at corelight.com (Robin Sommer) Date: Wed, 22 Aug 2018 07:54:14 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> Message-ID: <20180822145414.GC32073@corelight.com> On Tue, Aug 21, 2018 at 14:05 -0500, Jonathan Siwek wrote: > Though the Broker data corresponding to log entry content is also > opaque at the moment (I recall that was maybe for performance or > message volume optimization), Yeah, but generally this is something I could see opening up. The log structure is pretty straight-forward and self-describing, it'd be mostly a matter of clean up and documentation to make that directly accessible to external consumers I think. Events, on the other hands, are semantically tied very closely to the scripts generating them, and also much more diverse so that self-description doesn't really seem feasible/useful. Republishing a relevant subset certainly sounds better for that; or, if it's really a bulk feed that's desired, some out-of-band mechanism to convey the schema information somehow. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Thu Aug 23 08:01:02 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Thu, 23 Aug 2018 10:01:02 -0500 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> Message-ID: On Thu, Aug 23, 2018 at 8:32 AM Dominik Charousset wrote: > I?m a bit hesitant to rely on this header at the moment, because of: > > /// A Bro log-write message. Note that at the moment this should be used only > /// by Bro itself as the arguments aren't publicly defined. > > Is the API stable enough on your end at this point to make it public? The comment is just pointing out what was said about the log message formats being opaque at the moment. It's expected only Bro will be able to make sense of the content. > Also, there are LogCreate and LogWrite events. The LogCreate has the `fields_data` (a list of field names?). Yeah, there's some field info in there: names, types, optionality. The type info in particularly doesn't seem good to treat as intended for public consumption. > Does that mean I need to receive the LogCreate even first to understand successive LogWrite events? That would mean I cannot parse logs that had their LogCreate event before I was able to subscribe to the topic. Yeah, that's one problem, but a bigger issue is you can't parse LogWrite because the content is a serial blob whose format is another thing not intended for public consumption. - Jon From robin at corelight.com Thu Aug 23 08:28:29 2018 From: robin at corelight.com (Robin Sommer) Date: Thu, 23 Aug 2018 08:28:29 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> Message-ID: <20180823152828.GA43557@corelight.com> On Thu, Aug 23, 2018 at 10:01 -0500, Jonathan Siwek wrote: > Yeah, that's one problem, but a bigger issue is you can't parse > LogWrite because the content is a serial blob whose format is another > thing not intended for public consumption. I guess my earlier comment might have been misleading: there's certaily work that needs to be done to open this up. Right now, it's probably not even realistic at all because we still have a work around in place in there that uses the old (non-Broker) serialization code for creating that blob. That was to get around a performance issue, and still needs to be addressed. As part of upgrading that, I think it can make sense to think about documenting the format we end up chosing. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Thu Aug 23 08:31:02 2018 From: robin at corelight.com (Robin Sommer) Date: Thu, 23 Aug 2018 08:31:02 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> Message-ID: <20180823153102.GB43557@corelight.com> On Thu, Aug 23, 2018 at 15:32 +0200, Dominik Charousset wrote: > Does that mean I need to receive the LogCreate even first to > understand successive LogWrite events? I don't really see a way around that without substantially increasing volume. We could send LogCreate updates regularly, so that it's easier to synchronize with an ongoing stream. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From dominik.charousset at haw-hamburg.de Thu Aug 23 06:32:39 2018 From: dominik.charousset at haw-hamburg.de (Dominik Charousset) Date: Thu, 23 Aug 2018 15:32:39 +0200 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <20180822145414.GC32073@corelight.com> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> Message-ID: <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> > Dominik, wasn't the original idea for VAST to provide an event > description language that would create the link between the values > coming over the wire and their interpretation? Such a specification > could be auto-generated from Bro's knowledge about the events it > generates. We were actually thinking about auto-generating the schema. But broker::data simply has no meta information that we can use. Even distinguishing records/tuples from actual lists is impossible, because broker::vector is used for both. Of course we can make a couple of assumptions (the top-level vector is a record, for example), but then VAST users only ever can use type queries. In other words, they can only ask for IP addresses for example, but not specifically for originator IPs. In a sense, broker?s representation is an inverted JSON. In JSON, we have field names but no type information (everything is a string), whereas in broker we have (ambiguous) type information but no field names. :) >> Though the Broker data corresponding to log entry content is also >> opaque at the moment (I recall that was maybe for performance or >> message volume optimization), > > Yeah, but generally this is something I could see opening up. The log > structure is pretty straight-forward and self-describing, it'd be > mostly a matter of clean up and documentation to make that directly > accessible to external consumers I think. Events, on the other hands, > are semantically tied very closely to the scripts generating them, and > also much more diverse so that self-description doesn't really seem > feasible/useful. Republishing a relevant subset certainly sounds > better for that; or, if it's really a bulk feed that's desired, some > out-of-band mechanism to convey the schema information somehow. Opening that up would be great. However, our goal was to have Broker as a source for structured data that we can import in a generic fashion for later analysis. Of course that relies on a standard / convention / best practice for making schema programmatically accessible. Currently, it seems that we need a schema definition provided by the user offline. This will work as long as all published data for a given topic is uniform. Multiplexing multiple event types already makes things complicated, but it seems like this is actually the standard use case. OSQuery, for example, will generate different events that we than either need to separate into different topics or multiplex in a single topic but merge-in some meta information. And once we mix in meta information with actual data, a simple schema definition no longer cuts it. At worst, importing data from Broker requires a separate parser for each import format. > broker/bro.hh is basically all there is right now I?m a bit hesitant to rely on this header at the moment, because of: /// A Bro log-write message. Note that at the moment this should be used only /// by Bro itself as the arguments aren't publicly defined. Is the API stable enough on your end at this point to make it public? Also, there are LogCreate and LogWrite events. The LogCreate has the `fields_data` (a list of field names?). Does that mean I need to receive the LogCreate even first to understand successive LogWrite events? That would mean I cannot parse logs that had their LogCreate event before I was able to subscribe to the topic. Dominik From dopheide at es.net Thu Aug 23 12:16:02 2018 From: dopheide at es.net (Michael Dopheide) Date: Thu, 23 Aug 2018 14:16:02 -0500 Subject: [Bro-Dev] libmaxminddb configure issue Message-ID: Johanna mentioned to me that libmaxminddb should be working now in master... So far I haven't been able to get 'configure' to find it, neither with the OS packages nor by installing libmaxminddb in /usr/local/ and specifying --with-geoip. This is CentOS 7.5. -Dop -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180823/1905b17f/attachment.html From dopheide at es.net Thu Aug 23 12:52:26 2018 From: dopheide at es.net (Michael Dopheide) Date: Thu, 23 Aug 2018 14:52:26 -0500 Subject: [Bro-Dev] libmaxminddb configure issue In-Reply-To: References: Message-ID: More info: *snip* libmaxminddb: false <------- THIS Kerberos: false gperftools found: true tcmalloc: true debugging: false jemalloc: false ================================================================ -- Configuring done -- Generating done -- Build files have been written to: /usr/local/src/bro/build [root at sec-gpu bro]# grep MMDB build/CMakeCache.txt LibMMDB_INCLUDE_DIR:PATH=/usr/include LibMMDB_LIBRARY:FILEPATH=/usr/lib64/libmaxminddb.so LibMMDB_ROOT_DIR:PATH=/usr //Details about finding LibMMDB FIND_PACKAGE_MESSAGE_DETAILS_LibMMDB:INTERNAL=[/usr/lib64/libmaxminddb.so][/usr/include][v()] //ADVANCED property for variable: LibMMDB_INCLUDE_DIR LibMMDB_INCLUDE_DIR-ADVANCED:INTERNAL=1 //ADVANCED property for variable: LibMMDB_LIBRARY LibMMDB_LIBRARY-ADVANCED:INTERNAL=1 //ADVANCED property for variable: LibMMDB_ROOT_DIR LibMMDB_ROOT_DIR-ADVANCED:INTERNAL=1 Clearly it found the libraries just fine... this is where my cmake debugging ability starts to fall apart... -Dop On Thu, Aug 23, 2018 at 2:16 PM, Michael Dopheide wrote: > Johanna mentioned to me that libmaxminddb should be working now in > master... > > So far I haven't been able to get 'configure' to find it, neither with the > OS packages nor by installing libmaxminddb in /usr/local/ and specifying > --with-geoip. > > This is CentOS 7.5. > > -Dop > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180823/9c3ce1d4/attachment.html From dnthayer at illinois.edu Thu Aug 23 13:10:45 2018 From: dnthayer at illinois.edu (Thayer, Daniel N) Date: Thu, 23 Aug 2018 20:10:45 +0000 Subject: [Bro-Dev] libmaxminddb configure issue In-Reply-To: References: Message-ID: <8F865DA62E66F543B6104A2835719CF969E83A5F@CITESMBX5.ad.uillinois.edu> Could you try the following patch and let me know if it works for you: --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -134,7 +134,7 @@ include_directories(BEFORE set(USE_GEOIP false) find_package(LibMMDB) -if (LibMMDB_FOUND) +if (LIBMMDB_FOUND) set(USE_GEOIP true) include_directories(BEFORE ${LibMMDB_INCLUDE_DIR}) list(APPEND OPTLIBS ${LibMMDB_LIBRARY}) ---------------------------------------------------------- From: bro-dev-bounces at bro.org [bro-dev-bounces at bro.org] on behalf of Michael Dopheide [dopheide at es.net] Sent: Thursday, August 23, 2018 2:16 PM To: Subject: [Bro-Dev] libmaxminddb configure issue Johanna mentioned to me that libmaxminddb should be working now in master... So far I haven't been able to get 'configure' to find it, neither with the OS packages nor by installing libmaxminddb in /usr/local/ and specifying --with-geoip. This is CentOS 7.5. -Dop From dopheide at es.net Thu Aug 23 13:29:56 2018 From: dopheide at es.net (Michael Dopheide) Date: Thu, 23 Aug 2018 15:29:56 -0500 Subject: [Bro-Dev] libmaxminddb configure issue In-Reply-To: <8F865DA62E66F543B6104A2835719CF969E83A5F@CITESMBX5.ad.uillinois.edu> References: <8F865DA62E66F543B6104A2835719CF969E83A5F@CITESMBX5.ad.uillinois.edu> Message-ID: Yeah, I just figured that out myself and rebuilt... bro -e "print lookup_location(8.8.8.8);" [country_code=US, region=, city=, latitude=37.751, longitude=-97.822] Looks like you'll have the same issue with LibKRB5_FOUND (I didn't look for others). -Dop On Thu, Aug 23, 2018 at 3:10 PM, Thayer, Daniel N wrote: > Could you try the following patch and let me know if it works for you: > > --- a/CMakeLists.txt > +++ b/CMakeLists.txt > @@ -134,7 +134,7 @@ include_directories(BEFORE > > set(USE_GEOIP false) > find_package(LibMMDB) > -if (LibMMDB_FOUND) > +if (LIBMMDB_FOUND) > set(USE_GEOIP true) > include_directories(BEFORE ${LibMMDB_INCLUDE_DIR}) > list(APPEND OPTLIBS ${LibMMDB_LIBRARY}) > > > ---------------------------------------------------------- > > From: bro-dev-bounces at bro.org [bro-dev-bounces at bro.org] on behalf of > Michael Dopheide [dopheide at es.net] > > Sent: Thursday, August 23, 2018 2:16 PM > > To: > > Subject: [Bro-Dev] libmaxminddb configure issue > > > > > > > > > > Johanna mentioned to me that libmaxminddb should be working now in > master... > > > > > > So far I haven't been able to get 'configure' to find it, neither with the > OS packages nor by installing libmaxminddb in /usr/local/ and specifying > --with-geoip. > > > > > > This is CentOS 7.5. > > > > > > -Dop > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180823/72e15c10/attachment.html From jsiwek at corelight.com Thu Aug 23 14:58:49 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Thu, 23 Aug 2018 16:58:49 -0500 Subject: [Bro-Dev] libmaxminddb configure issue In-Reply-To: References: <8F865DA62E66F543B6104A2835719CF969E83A5F@CITESMBX5.ad.uillinois.edu> Message-ID: Thanks, does look like these wouldn't work as intended for CMake < 3.3, but I've merged Daniel's patch in to master now. - Jon On Thu, Aug 23, 2018 at 3:36 PM Michael Dopheide wrote: > > Yeah, I just figured that out myself and rebuilt... > > bro -e "print lookup_location(8.8.8.8);" > [country_code=US, region=, city=, latitude=37.751, longitude=-97.822] > > Looks like you'll have the same issue with LibKRB5_FOUND (I didn't look for others). > > -Dop > > On Thu, Aug 23, 2018 at 3:10 PM, Thayer, Daniel N wrote: >> >> Could you try the following patch and let me know if it works for you: >> >> --- a/CMakeLists.txt >> +++ b/CMakeLists.txt >> @@ -134,7 +134,7 @@ include_directories(BEFORE >> >> set(USE_GEOIP false) >> find_package(LibMMDB) >> -if (LibMMDB_FOUND) >> +if (LIBMMDB_FOUND) >> set(USE_GEOIP true) >> include_directories(BEFORE ${LibMMDB_INCLUDE_DIR}) >> list(APPEND OPTLIBS ${LibMMDB_LIBRARY}) >> >> >> ---------------------------------------------------------- >> >> From: bro-dev-bounces at bro.org [bro-dev-bounces at bro.org] on behalf of Michael Dopheide [dopheide at es.net] >> >> Sent: Thursday, August 23, 2018 2:16 PM >> >> To: >> >> Subject: [Bro-Dev] libmaxminddb configure issue >> >> >> >> >> >> >> >> >> >> Johanna mentioned to me that libmaxminddb should be working now in master... >> >> >> >> >> >> So far I haven't been able to get 'configure' to find it, neither with the OS packages nor by installing libmaxminddb in /usr/local/ and specifying --with-geoip. >> >> >> >> >> >> This is CentOS 7.5. >> >> >> >> >> >> -Dop >> >> >> >> >> >> >> > > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From vallentin at icir.org Fri Aug 24 07:32:24 2018 From: vallentin at icir.org (Matthias Vallentin) Date: Fri, 24 Aug 2018 16:32:24 +0200 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <20180823153102.GB43557@corelight.com> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> <20180823153102.GB43557@corelight.com> Message-ID: > I don't really see a way around that without substantially increasing > volume. We could send LogCreate updates regularly, so that it's easier > to synchronize with an ongoing stream. It sounds like this is critical also for regular operation: (1) when an endpoint bootstraps slowly and the LogCreate message has already been sent, it doesn't know what to do, and (2) when an endpoint crashes and comes back, it may have lost the state from the initial LogCreate. That said, I want to make sure I understood you correctly: is it currently impossible to parse Bro logs with Broker, because all logs come in the LogWrite message, wich is a binary blob? It sounds like that the topic /bro/logs gets the LogCreate and LogWrite messages. In other words, can Broker currently be used if one writes a Bro script that publishes plain events (message type 1 in bro.hh)? Matthias From robin at corelight.com Fri Aug 24 08:13:51 2018 From: robin at corelight.com (Robin Sommer) Date: Fri, 24 Aug 2018 08:13:51 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> <20180823153102.GB43557@corelight.com> Message-ID: <20180824151351.GD43557@corelight.com> On Fri, Aug 24, 2018 at 16:32 +0200, Matthias Vallentin wrote: > It sounds like this is critical also for regular operation: Agree. Right now a newly connecting peer gets a round of explicit LogCreates, but that's probably not the best way forward for larger topologies. > is it currently impossible to parse Bro logs with Broker, because all > logs come in the LogWrite message, wich is a binary blob? Correct. (This was different at first, but the switch was necessary for performance. It's waiting for a better solution at this point.) > In other words, can Broker currently be used if one writes a Bro > script that publishes plain events (message type 1 in bro.hh)? Yes to that. Non-Bros can exchange events (assuming they know the schema), but not logs. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From vallentin at icir.org Sat Aug 25 08:42:35 2018 From: vallentin at icir.org (Matthias Vallentin) Date: Sat, 25 Aug 2018 17:42:35 +0200 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <20180824151351.GD43557@corelight.com> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> <20180823153102.GB43557@corelight.com> <20180824151351.GD43557@corelight.com> Message-ID: > Agree. Right now a newly connecting peer gets a round of explicit > LogCreates, but that's probably not the best way forward for larger > topologies. Okay. In the future, we probably need some form of "serialization-free" batching mechanism to ship data more efficiently. There exist technologies like Apache Arrow, flatbuffers, Cap'N'Proto, MsgPack, etc., all of which require building a set of values once, and then just copying them around as a binary blob on the wire. Deserialization is not needed because one would typically only "view" the data through light-weight accessors. We're doing something similar in VAST for performance reasons, but Bro and Broker have the exact same issues in that regard. > > In other words, can Broker currently be used if one writes a Bro > > script that publishes plain events (message type 1 in bro.hh)? > > Yes to that. Non-Bros can exchange events (assuming they know the > schema), but not logs. Got it. (Unfortunately that will make our BroCon talk pretty boring in terms of throughput analysis, because we were planning to build an end-to-end log ingestion system based on Broker. We'll probably switch gears a bit and focus more on the latency side, where a Bro script publishes something to an external application and receives feedback though an auxiliary channel.) Matthias From robin at corelight.com Mon Aug 27 07:50:42 2018 From: robin at corelight.com (Robin Sommer) Date: Mon, 27 Aug 2018 07:50:42 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> <20180823153102.GB43557@corelight.com> <20180824151351.GD43557@corelight.com> Message-ID: <20180827145042.GH43557@corelight.com> On Sat, Aug 25, 2018 at 17:42 +0200, Matthias Vallentin wrote: > Okay. In the future, we probably need some form of > "serialization-free" batching mechanism to ship data more efficiently. Do you guys have a sense of how load splits up between serialization and batching/communication? My hope has been that batching itself can take care of the performance issues, so that we'll be able to send logs as standard CAF messages, each one representing a batch of N log lines. The benchmark I had created a little while ago to examine that wasn't able to get the necessary performance out of Broker/CAF to do that (hence the fall-back to Bro's old serialization of log messages for now, sent over CAF). But iirc, the conclusion was that there's still room for improvement in CAF that should make this feasible eventually. However, if you guys believe it's really CAF's serialization that's the bottle-neck, then we'll need to come up with something else indeed. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From fatema.bannatwala at gmail.com Mon Aug 27 13:48:04 2018 From: fatema.bannatwala at gmail.com (fatema bannatwala) Date: Mon, 27 Aug 2018 16:48:04 -0400 Subject: [Bro-Dev] Implementing DNSSEC Parser in Bro. Message-ID: Hi All, I am in the process of writing parser for the DNSSEC RR types in DNS responses, and written RRSIG (type=46) parser by adding code to existing DNS protocol analyzer in Bro 2.5.4 src code. I have tested the code by recompiling it on our test server and running it against a dns pcap, and it correctly parses the RRSIG record and logs it. And hence have requested a Pull request to merge in the upstream Bro master repo . Planning to write the remaining DNSSEC RR types: NSEC, DS and DNSKEY parsing in Bro DNS analyzer as well, once I get the feedback on the current merge request of the code for parsing RRSIG. Thanks, Fatema. -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180827/cf1b8973/attachment.html From dominik.charousset at haw-hamburg.de Tue Aug 28 08:12:11 2018 From: dominik.charousset at haw-hamburg.de (Dominik Charousset) Date: Tue, 28 Aug 2018 17:12:11 +0200 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <20180827145042.GH43557@corelight.com> References: <461E7327-0AED-41CB-8227-95D3F2DF2A0F@haw-hamburg.de> <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> <20180823153102.GB43557@corelight.com> <20180824151351.GD43557@corelight.com> <20180827145042.GH43557@corelight.com> Message-ID: <46AB0057-D8D5-4E28-B81A-6BCF7C9F02F3@haw-hamburg.de> >> Okay. In the future, we probably need some form of >> "serialization-free" batching mechanism to ship data more efficiently. > > Do you guys have a sense of how load splits up between serialization > and batching/communication? My hope has been that batching itself can > take care of the performance issues, so that we'll be able to send > logs as standard CAF messages, each one representing a batch of N log > lines. The benchmark I had created a little while ago to examine that > wasn't able to get the necessary performance out of Broker/CAF to do > that (hence the fall-back to Bro's old serialization of log messages > for now, sent over CAF). But iirc, the conclusion was that there's > still room for improvement in CAF that should make this feasible > eventually. However, if you guys believe it's really CAF's > serialization that's the bottle-neck, then we'll need to come up with > something else indeed. I think there are a couple of orthogonal aspects merged together here. Namely, (1) memory-mapping, (2) batching, and (3) performance of CAF's serialization. 1) Matthias threw in memory-mapping, but I?m not so sure if this is actually feasible for you. The main benefit here is to have a unified representation in memory, on disk, and on the wire. I think you?re still going to keep the ASCII log output format for Bro logs. Also, a memory-mapped format would mean to drop the current broker::data API entirely. My hunch is that you would rather not break the API immediately after releasing it to the public. 2) CAF already does batching. Ideally, Broker should not need to do any additional batching on top of that. In fact, doing the batching in user code greatly diminishes effectiveness of CAF?s own batching, because now CAF can no longer break up chunks on its own to make efficient use of resources. 3) Serialization should really not be a bottleneck. The costly part is shuffling bytes around in buffers and heap allocations when deserializing a broker::data. There?s no way around these two costs. Do you still remember what showed up during your investigation that triggered you to go with the blob? Because what I can see as a *much* bigger issue is *copying* overhead, not serialization. CAF streams assume that individual elements are cheap to copy. So probably a copy-on-write optimization for broker::data would have a much higher impact on performance (it?s also straightforward to implement and CAF has re-usable pieces for that). If serialization still shows up with unreasonable costs in a profiler, however, there are ways to speed things up. The customization point here is a specialized inspect() overload for broker::data that essentially allows you apply all optimization you want (and that might be used in Bro?s framework). I hope we?re not talking past each other. :) An in-depth performance analysis of Broker?s streaming layer is on my todo list for months at this point. I hope I get something done before the Bro Workshop in Europe. Then we can hopefully discuss this with some reliable data in person. Dominik From johanna at icir.org Tue Aug 28 10:41:51 2018 From: johanna at icir.org (Johanna Amann) Date: Tue, 28 Aug 2018 10:41:51 -0700 Subject: [Bro-Dev] Jira filter results Message-ID: <20180828174151.a3d6vodotubeu7ju@Beezling.local> Hi, when I go to tracker.bro.org, the top-right box (Filter result) for me shows: "The filter configured for this gadget could not be retrieved. Please verify it is still valid on the issue navigator.". This seems to be independent of Browser. I think this used to show the merge-requests. Can someone perhaps fix that again? :) Thanks, Johanna From robin at corelight.com Tue Aug 28 11:28:51 2018 From: robin at corelight.com (Robin Sommer) Date: Tue, 28 Aug 2018 11:28:51 -0700 Subject: [Bro-Dev] Broker data layouts In-Reply-To: <46AB0057-D8D5-4E28-B81A-6BCF7C9F02F3@haw-hamburg.de> References: <20180821180927.GA45660@corelight.com> <20180822145414.GC32073@corelight.com> <18A9EFDE-92A3-47C9-9458-83E56C599CFF@haw-hamburg.de> <20180823153102.GB43557@corelight.com> <20180824151351.GD43557@corelight.com> <20180827145042.GH43557@corelight.com> <46AB0057-D8D5-4E28-B81A-6BCF7C9F02F3@haw-hamburg.de> Message-ID: <20180828182851.GA73346@corelight.com> On Tue, Aug 28, 2018 at 17:12 +0200, Dominik Charousset wrote: > 1) Matthias threw in memory-mapping, but I?m not so sure if this is > actually feasible for you. Yeah, our normal use case is different, memory-mapping won't help much with that. > 2) CAF already does batching. Ideally, Broker should not need to do > any additional batching on top of that. Yep, but (3) was the problem with that: > Do you still remember what showed up during your investigation that > triggered you to go with the blob? Looking back through emails, at some point Jon replaced CAF serialization with these blobs and got substantially better performance. He also had a patch that reproduced the effect with the benchmark tool you wrote. I'm pasting that in below, I'm assuming it still applies. Looks like the conclusion at that time was that it is indeed an issue with the serialization and/or copying the data. > An in-depth performance analysis of Broker?s streaming layer is on my > todo list for months at this point. I hope I get something done before > the Bro Workshop in Europe. That would be great. :) Robin ``` diff --git a/tests/benchmark/broker-stream-benchmark.cc b/tests/benchmark/broker-stream-benchmark.cc index 821ac39..26b0778 100644 --- a/tests/benchmark/broker-stream-benchmark.cc +++ b/tests/benchmark/broker-stream-benchmark.cc @@ -1,6 +1,7 @@ #include #include +#include using std::cout; using std::cerr; @@ -55,8 +56,11 @@ void publish_mode(broker::endpoint& ep, const std::string& topic_str) { // nop }, [=](caf::unit_t&, downstream>& out, size_t num) { - for (size_t i = 0; i < num; ++i) - out.push(std::make_pair(topic_str, "Lorem ipsum dolor sit amet.")); + for (size_t i = 0; i < num; ++i) { + auto ev = broker::bro::Event(std::string("event_1"), + std::vector{42, "test"}); + out.push(std::make_pair(topic_str, std::move(ev))); + } global_count += num; }, [=](const caf::unit_t&) { ``` -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Tue Aug 28 12:08:13 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 28 Aug 2018 14:08:13 -0500 Subject: [Bro-Dev] Jira filter results In-Reply-To: <20180828174151.a3d6vodotubeu7ju@Beezling.local> References: <20180828174151.a3d6vodotubeu7ju@Beezling.local> Message-ID: On Tue, Aug 28, 2018 at 12:48 PM Johanna Amann wrote: > "The filter configured for this gadget could not be retrieved. Please > verify it is still valid on the issue navigator.". Should be showing merge requests again. - Jon From jsiwek at corelight.com Wed Aug 29 08:13:58 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 29 Aug 2018 10:13:58 -0500 Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6) In-Reply-To: <201808282336.w7SNaIJW011965@bro-ids.icir.org> References: <201808282336.w7SNaIJW011965@bro-ids.icir.org> Message-ID: On Tue, Aug 28, 2018 at 6:35 PM Johanna Amann wrote: > + If you use these events, you can make your scripts work on old and new versions > + of Bro by wrapping the event definition in an @if, for example: > + > + @if ( Version::at_least("2.6") || ( Version::number == 20500 && Version::info$commit >= [commit number of change] ) ) > + event ssl_client_hello(c: connection, version: count, record_version: count, possible_ts: time, client_random: string, session_id: string, ciphers: index_vec, comp_methods: index_vec) > + @else > + event ssl_client_hello(c: connection, version: count, possible_ts: time, client_random: string, session_id: string, ciphers: index_vec) > + @endif Since the parser won't be happy with that type of @if usage in old releases due to [1], should we instead suggest something like: function my_ssl_client_hello_impl(c: connection, version: count, possible_ts: time, client_random: string, session_id: string, ciphers: index_vec, record_version: counter &default=0, comp_methods: index_vec &default=index_vec()) { # Copy existing code to here } @if ( Version::at_least("2.6") || ( Version::number == 20500 && Version::info$commit >= [commit number of change] ) ) event ssl_client_hello(c: connection, version: count, record_version: count, possible_ts: time, client_random: string, session_id: string, ciphers: index_vec, comp_methods: index_vec) { my_ssl_client_hello_impl(c, version, possible_ts, client_random, session_id, ciphers, record_version, comp_methods); } @else event ssl_client_hello(c: connection, version: count, possible_ts: time, client_random: string, session_id: string, ciphers: index_vec) { my_ssl_client_hello_impl(c, version, possible_ts, client_random, session_id, ciphers); } @endif - Jon [1] https://bro-tracker.atlassian.net/browse/BIT-1976 From johanna at icir.org Wed Aug 29 09:02:36 2018 From: johanna at icir.org (Johanna Amann) Date: Wed, 29 Aug 2018 09:02:36 -0700 Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6) In-Reply-To: References: <201808282336.w7SNaIJW011965@bro-ids.icir.org> Message-ID: <8769ED61-CFBA-458C-9E7F-8ED68605451D@icir.org> Hi Jon, I actually tested it - and it works fine with old versions as long as you use the @if this way round. So @if ( version >= 2.6) event 2.6-event @else event 2.5-event @endif works perfectly with 2.5 and 2.6. @if ( version <= 2.6) event 2.5-event @else event 2.6-event @endif breaks with 2.5. I admittedly stopped looking for the exact reason why at some point - but I tested it rather thoroughly :). And I admittedly only figured that out after I wrote my comment to the merge-request. So - I am tempted to put it in NEWS like this - I assume most people will just copy-paste it because the @if-statement is complex enough that you will not come up with it yourself easily... Johanna On 29 Aug 2018, at 8:13, Jon Siwek wrote: > On Tue, Aug 28, 2018 at 6:35 PM Johanna Amann > wrote: > >> + If you use these events, you can make your scripts work on old and >> new versions >> + of Bro by wrapping the event definition in an @if, for example: >> + >> + @if ( Version::at_least("2.6") || ( Version::number == 20500 && >> Version::info$commit >= [commit number of change] ) ) >> + event ssl_client_hello(c: connection, version: count, >> record_version: count, possible_ts: time, client_random: string, >> session_id: string, ciphers: index_vec, comp_methods: index_vec) >> + @else >> + event ssl_client_hello(c: connection, version: count, >> possible_ts: time, client_random: string, session_id: string, >> ciphers: index_vec) >> + @endif > > Since the parser won't be happy with that type of @if usage in old > releases due to [1], should we instead suggest something like: > > function my_ssl_client_hello_impl(c: connection, version: count, > possible_ts: time, client_random: string, session_id: string, ciphers: > index_vec, record_version: counter &default=0, comp_methods: index_vec > &default=index_vec()) > { > # Copy existing code to here > } > > @if ( Version::at_least("2.6") || ( Version::number == 20500 && > Version::info$commit >= [commit number of change] ) ) > event ssl_client_hello(c: connection, version: count, record_version: > count, possible_ts: time, client_random: string, session_id: string, > ciphers: index_vec, comp_methods: index_vec) > { my_ssl_client_hello_impl(c, version, possible_ts, client_random, > session_id, ciphers, record_version, comp_methods); } > @else > event ssl_client_hello(c: connection, version: count, possible_ts: > time, client_random: string, session_id: string, ciphers: index_vec) > { my_ssl_client_hello_impl(c, version, possible_ts, client_random, > session_id, ciphers); } > @endif > > - Jon > > [1] https://bro-tracker.atlassian.net/browse/BIT-1976 > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From jazoff at illinois.edu Wed Aug 29 09:10:33 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Wed, 29 Aug 2018 16:10:33 +0000 Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6) In-Reply-To: <8769ED61-CFBA-458C-9E7F-8ED68605451D@icir.org> References: <201808282336.w7SNaIJW011965@bro-ids.icir.org> <8769ED61-CFBA-458C-9E7F-8ED68605451D@icir.org> Message-ID: > On Aug 29, 2018, at 12:02 PM, Johanna Amann wrote: > > @if ( version <= 2.6) > event 2.5-event > @else > event 2.6-event > @endif > > breaks with 2.5. Should that be < and not <= ? ? Justin Azoff From johanna at icir.org Wed Aug 29 09:12:27 2018 From: johanna at icir.org (Johanna Amann) Date: Wed, 29 Aug 2018 09:12:27 -0700 Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6) In-Reply-To: References: <201808282336.w7SNaIJW011965@bro-ids.icir.org> <8769ED61-CFBA-458C-9E7F-8ED68605451D@icir.org> Message-ID: <5E2A17CC-47E6-4B78-B82F-3C00DB41C65C@icir.org> Sorry, yup. Johanna On 29 Aug 2018, at 9:10, Azoff, Justin S wrote: >> On Aug 29, 2018, at 12:02 PM, Johanna Amann wrote: >> >> @if ( version <= 2.6) >> event 2.5-event >> @else >> event 2.6-event >> @endif >> >> breaks with 2.5. > > Should that be < and not <= ? > > ? > Justin Azoff From jsiwek at corelight.com Wed Aug 29 11:03:14 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 29 Aug 2018 13:03:14 -0500 Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/johanna/tls-more-data: Update NEWS for ssl changes. (3c7c60cf6) In-Reply-To: <8769ED61-CFBA-458C-9E7F-8ED68605451D@icir.org> References: <201808282336.w7SNaIJW011965@bro-ids.icir.org> <8769ED61-CFBA-458C-9E7F-8ED68605451D@icir.org> Message-ID: On Wed, Aug 29, 2018 at 11:02 AM Johanna Amann wrote: > I actually tested it - and it works fine with old versions as long as > you use the @if this way round. Ah, tricky. I can see how that would work now, thanks for clarifying. - Jon From jazoff at illinois.edu Thu Aug 30 07:39:17 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Thu, 30 Aug 2018 14:39:17 +0000 Subject: [Bro-Dev] Compatibilty script for policy/protocols/smb? Message-ID: <7543D35E-413C-4039-9F85-1B1BD2F7850C@illinois.edu> Upgrading between master builds I just ran into this: fatal error in /bro/share/bro/site/local.bro, line 88: can't open /bro/share/bro/policy/protocols/smb/__load__.bro I see in NEWS we have - The SMB scripts in policy/protocols/smb are now moved into base/protocols/smb and loaded/enabled by default. But should there be an empty script in there or something that does a reporter warning telling people to update local.bro? ? Justin Azoff From jsiwek at corelight.com Thu Aug 30 11:26:42 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Thu, 30 Aug 2018 13:26:42 -0500 Subject: [Bro-Dev] Compatibilty script for policy/protocols/smb? In-Reply-To: <7543D35E-413C-4039-9F85-1B1BD2F7850C@illinois.edu> References: <7543D35E-413C-4039-9F85-1B1BD2F7850C@illinois.edu> Message-ID: On Thu, Aug 30, 2018 at 9:50 AM Azoff, Justin S wrote: > fatal error in /bro/share/bro/site/local.bro, line 88: can't open /bro/share/bro/policy/protocols/smb/__load__.bro > > I see in NEWS we have > > - The SMB scripts in policy/protocols/smb are now moved into base/protocols/smb > and loaded/enabled by default. > > But should there be an empty script in there or something that does a reporter warning telling people to update local.bro? Thanks for pointing that out. I'll put a placeholder at the old policy/ location, but also call out in NEWS that such @loads can be removed from local.bro or other custom scripts. Or let me know if there's other ideas. - Jon From jazoff at illinois.edu Thu Aug 30 13:01:20 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Thu, 30 Aug 2018 20:01:20 +0000 Subject: [Bro-Dev] Compatibilty script for policy/protocols/smb? In-Reply-To: References: <7543D35E-413C-4039-9F85-1B1BD2F7850C@illinois.edu> Message-ID: > On Aug 30, 2018, at 2:26 PM, Jon Siwek wrote: > > On Thu, Aug 30, 2018 at 9:50 AM Azoff, Justin S wrote: > >> fatal error in /bro/share/bro/site/local.bro, line 88: can't open /bro/share/bro/policy/protocols/smb/__load__.bro >> >> I see in NEWS we have >> >> - The SMB scripts in policy/protocols/smb are now moved into base/protocols/smb >> and loaded/enabled by default. >> >> But should there be an empty script in there or something that does a reporter warning telling people to update local.bro? > > Thanks for pointing that out. > > I'll put a placeholder at the old policy/ location, but also call out > in NEWS that such @loads can be removed from local.bro or other custom > scripts. > > Or let me know if there's other ideas. Sounds good to me. I was curious why this test didn't catch this: testing/btest/scripts/site/local-compat.test but the file as shipped was # Uncomment the following line to enable the SMB analyzer. The analyzer # is currently considered a preview and therefore not loaded by default. # @load policy/protocols/smb so while 2.6 would have been compatible with the 2.5 config as it was distributed, it would have broken anyone that uncommented the line. ? Justin Azoff From Jawad.Rajput at hq.doe.gov Thu Aug 30 13:11:07 2018 From: Jawad.Rajput at hq.doe.gov (Rajput, Jawad (CONTR)) Date: Thu, 30 Aug 2018 20:11:07 +0000 Subject: [Bro-Dev] Bro 2.5 Packet Drop Issue Message-ID: Hello Everyone, I am reaching out with the hope that someone will be able to help us with an issue we are having with Bro upgrade from 2.4.1 to 2.5.X. We have a system with 12 core (3Ghz) ,128GB RAM, and 10G NIC (Intel X520-SR2 10GbE Dual-port), monitoring between 1.5 - 2.5 Gbps traffic. Bro 2.4.1 is working great and periodically drops 2-5% when traffic peaks at ~ 2.5. However, when we upgrade to Bro 2.5.3/4 on the same exact system the drops go up to 90%. We are using CentOS-7 and tired installing Bro and Pfring from both rpm and source without any luck. I wonder if anyone has seen this issue and can give some clues to resolve this issue. Bro Node Conf: [manager] type=manager host=localhost # [proxy-1] type=proxy host=localhost # [worker-1] type=worker host=localhost interface=ens1f1 lb_method=pf_ring lb_procs=11 pin_cpus=1,2,3,4,5,6,7,8,9,10,11 [root at bro-test ~]# cat /proc/net/pf_ring/info PF_RING Version : 7.3.0 (unknown) Total rings : 11 Standard (non ZC) Options Ring slots : 65534 Slot version : 17 Capture TX : No [RX only] IP Defragment : No Socket Mode : Standard Cluster Fragment Queue : 0 Cluster Fragment Discard : 0 [root at bro-test ~]# tailf /opt/bro/logs/current/capture_loss.log 1535647921.339324 60.000005 worker-1-8 318331 425005 74.900531 1535647921.217853 60.000000 worker-1-5 264716 349078 75.832908 1535647921.241244 60.000021 worker-1-9 265863 364089 73.021432 1535647921.312567 60.000002 worker-1-1 239036 315823 75.686698 1535647922.188607 60.000420 worker-1-4 238192 322818 73.785229 1535647922.760560 60.000029 worker-1-11 250678 338188 74.12386 1535647922.864470 60.000075 worker-1-3 232467 314963 73.807717 1535647923.413121 60.000024 worker-1-10 254241 345382 73.611537 1535647923.205954 60.001556 worker-1-2 259932 354980 73.224407 [root at bro-test ~]# less /opt/bro/logs/current/stats.log | bro-cut ts peer mem pkts_proc bytes_recv pkts_dropped 1535644801.328981 worker-1-8 2854 3523252 2214563854 8841163 1535644801.235592 worker-1-9 2833 3422300 2135680645 9083143 1535644801.299138 worker-1-1 2801 3358673 2089659287 9059868 1535644802.177016 worker-1-4 2727 3262089 2027645336 9155838 1535644801.187590 worker-1-5 2640 3336190 2085853940 9332917 1535644802.750617 worker-1-11 2726 3432674 2153405372 9018943 1535644802.853617 worker-1-3 2816 3448836 2161753414 8929662 1535644803.186853 worker-1-2 2659 3387742 2116043509 9176871 1535644803.395256 worker-1-10 2871 3407486 2132043052 9049047 1535644803.403778 worker-1-7 2821 3278503 2023604941 9966347 1535644850.898433 manager 2340 0 0 - 1535644804.257320 proxy-1 73 0 0 - [root at bro-test logs]# broctl netstats worker-1-1: 1535651356.794609 recvd=3501813131 dropped=3589205826 link=3501813131 worker-1-2: 1535651358.808626 recvd=4033892471 dropped=3057179730 link=4033892471 worker-1-3: 1535651358.587316 recvd=3930325145 dropped=3160768660 link=3930325145 worker-1-4: 1535651357.702299 recvd=3561053809 dropped=3530086444 link=3561053809 worker-1-5: 1535651357.650359 recvd=3399338460 dropped=3691836209 link=3399338460 worker-1-6: 1535651334.912244 recvd=3714154738 dropped=3376978237 link=3714154738 worker-1-7: 1535651359.119492 recvd=3684804437 dropped=3406432666 link=3684804437 worker-1-8: 1535651359.668621 recvd=4020016563 dropped=3071265083 link=4020016563 worker-1-9: 1535651359.867601 recvd=3807658264 dropped=3283669188 link=3807658264 worker-1-10: 1535651359.749253 recvd=3703077938 dropped=3388277853 link=3703077938 worker-1-11: 1535651359.907420 recvd=4052516305 dropped=3038874387 link=4052516305 nload output for capture NIC: [cid:image001.png at 01D4407C.0E3A9670] Jawad Rajput System Administrator U.S. Department of Energy IM-62 /Germantown Building HQ Network Security Team Email: Jawad.Rajput at hq.doe.gov Office: 301-903-2176 Office: 301-903-3895 -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180830/0fd1653e/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 20047 bytes Desc: image001.png Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20180830/0fd1653e/attachment-0001.bin From jazoff at illinois.edu Thu Aug 30 13:28:37 2018 From: jazoff at illinois.edu (Azoff, Justin S) Date: Thu, 30 Aug 2018 20:28:37 +0000 Subject: [Bro-Dev] Bro 2.5 Packet Drop Issue In-Reply-To: References: Message-ID: <05002F0A-8D92-4C30-9C4C-EF2F35C3D739@illinois.edu> > On Aug 30, 2018, at 4:11 PM, Rajput, Jawad (CONTR) wrote: > > Hello Everyone, > > I am reaching out with the hope that someone will be able to help us with an issue we are having with Bro upgrade from 2.4.1 to 2.5.X. > > We have a system with 12 core (3Ghz) ,128GB RAM, and 10G NIC (Intel X520-SR2 10GbE Dual-port), monitoring between 1.5 - 2.5 Gbps traffic. > > Bro 2.4.1 is working great and periodically drops 2-5% when traffic peaks at ~ 2.5. However, when we upgrade to Bro 2.5.3/4 on the same exact system the drops go up to 90%. > > We are using CentOS-7 and tired installing Bro and Pfring from both rpm and source without any luck. I wonder if anyone has seen this issue and can give some clues to resolve this issue. > > Bro Node Conf: > [manager] > type=manager > host=localhost > # > [proxy-1] > type=proxy > host=localhost > > # > [worker-1] > type=worker > host=localhost > interface=ens1f1 > lb_method=pf_ring > lb_procs=11 > pin_cpus=1,2,3,4,5,6,7,8,9,10,11 You're missing a logger process, adding one will make the cluster run better: [logger] type=logger host=localhost > [root at bro-test ~]# cat /proc/net/pf_ring/info > PF_RING Version : 7.3.0 (unknown) > Total rings : 11 you should have 1, not 11... > Standard (non ZC) Options > Ring slots : 65534 > Slot version : 17 > Capture TX : No [RX only] > IP Defragment : No > Socket Mode : Standard > Cluster Fragment Queue : 0 > Cluster Fragment Discard : 0 Looks like you are having the issue where bro is not actually use pf_ring load balancing if you installed it from rpms. What you're effectively doing is running 11 workers that are all receiving 100% of the traffic, so you are doing 11 times the work. You can further confirm that this is the problem you are having by running broctl config | grep -i clusterid and seeing if the id is set to 0: pfringclusterid = 0 if so, edit /opt/bro/etc/broctl.cfg and add PFRINGClusterID = 11 and broctl deploy to restart everything. This is already fixed and won't happen again in bro >= 2.6... just keeps tripping people up on 2.5.x You should also look into switching to the native bro pf_ring plugin or the bro af_packet plugin which are both better choices than using the pcap wrapper method. ? Justin Azoff From Jawad.Rajput at hq.doe.gov Thu Aug 30 14:13:32 2018 From: Jawad.Rajput at hq.doe.gov (Rajput, Jawad (CONTR)) Date: Thu, 30 Aug 2018 21:13:32 +0000 Subject: [Bro-Dev] Bro 2.5 Packet Drop Issue In-Reply-To: <05002F0A-8D92-4C30-9C4C-EF2F35C3D739@illinois.edu> References: <05002F0A-8D92-4C30-9C4C-EF2F35C3D739@illinois.edu> Message-ID: Thank you so much Justin, the solution worked. We were literally troubleshooting for more than a month and did not find anything online. Jawad Rajput System Administrator U.S. Department of Energy IM-62 /Germantown Building HQ Network Security Team Email: Jawad.Rajput at hq.doe.gov Office: 301-903-2176 Office: 301-903-3895 Cell: 301-795-5406 -----Original Message----- From: Azoff, Justin S [mailto:jazoff at illinois.edu] Sent: Thursday, August 30, 2018 4:29 PM To: Rajput, Jawad (CONTR) Cc: bro-dev at bro.org; Danis, Andrew (CONTR) Subject: Re: [Bro-Dev] Bro 2.5 Packet Drop Issue > On Aug 30, 2018, at 4:11 PM, Rajput, Jawad (CONTR) wrote: > > Hello Everyone, > > I am reaching out with the hope that someone will be able to help us with an issue we are having with Bro upgrade from 2.4.1 to 2.5.X. > > We have a system with 12 core (3Ghz) ,128GB RAM, and 10G NIC (Intel X520-SR2 10GbE Dual-port), monitoring between 1.5 - 2.5 Gbps traffic. > > Bro 2.4.1 is working great and periodically drops 2-5% when traffic peaks at ~ 2.5. However, when we upgrade to Bro 2.5.3/4 on the same exact system the drops go up to 90%. > > We are using CentOS-7 and tired installing Bro and Pfring from both rpm and source without any luck. I wonder if anyone has seen this issue and can give some clues to resolve this issue. > > Bro Node Conf: > [manager] > type=manager > host=localhost > # > [proxy-1] > type=proxy > host=localhost > > # > [worker-1] > type=worker > host=localhost > interface=ens1f1 > lb_method=pf_ring > lb_procs=11 > pin_cpus=1,2,3,4,5,6,7,8,9,10,11 You're missing a logger process, adding one will make the cluster run better: [logger] type=logger host=localhost > [root at bro-test ~]# cat /proc/net/pf_ring/info > PF_RING Version : 7.3.0 (unknown) > Total rings : 11 you should have 1, not 11... > Standard (non ZC) Options > Ring slots : 65534 > Slot version : 17 > Capture TX : No [RX only] > IP Defragment : No > Socket Mode : Standard > Cluster Fragment Queue : 0 > Cluster Fragment Discard : 0 Looks like you are having the issue where bro is not actually use pf_ring load balancing if you installed it from rpms. What you're effectively doing is running 11 workers that are all receiving 100% of the traffic, so you are doing 11 times the work. You can further confirm that this is the problem you are having by running broctl config | grep -i clusterid and seeing if the id is set to 0: pfringclusterid = 0 if so, edit /opt/bro/etc/broctl.cfg and add PFRINGClusterID = 11 and broctl deploy to restart everything. This is already fixed and won't happen again in bro >= 2.6... just keeps tripping people up on 2.5.x You should also look into switching to the native bro pf_ring plugin or the bro af_packet plugin which are both better choices than using the pcap wrapper method. ? Justin Azoff From johanna at icir.org Thu Aug 30 15:05:58 2018 From: johanna at icir.org (Johanna Amann) Date: Thu, 30 Aug 2018 15:05:58 -0700 Subject: [Bro-Dev] [Bro-Commits] [git/bro] master: Allow loading policy/protocols/smb once again (57a505b0e) In-Reply-To: <201808302108.w7UL8ONm018535@bro-ids.icir.org> References: <201808302108.w7UL8ONm018535@bro-ids.icir.org> Message-ID: <1B9234BE-06D3-4E97-98B2-ACB9383A5342@icir.org> To pick up the idea that you mentioned before - do we also want to make the new policy/protocols/smb/__load__.bro trigger a reporter warning that it is deprecated? Johanna On 30 Aug 2018, at 14:07, Jonathan Siwek wrote: > Repository : ssh://git at bro-ids.icir.org/bro > On branch : master > Link : > https://github.com/bro/bro/commit/57a505b0e46d499644a6fb3b063cece0684240b8 > >> --------------------------------------------------------------- > > commit 57a505b0e46d499644a6fb3b063cece0684240b8 > Author: Jon Siwek > Date: Thu Aug 30 16:05:36 2018 -0500 > > Allow loading policy/protocols/smb once again > > It just redirects to base/protocols/smb > > >> --------------------------------------------------------------- > > 57a505b0e46d499644a6fb3b063cece0684240b8 > CHANGES | 4 ++++ > NEWS | 8 ++++++-- > VERSION | 2 +- > scripts/policy/protocols/smb/__load__.bro | 1 + > scripts/test-all-policy.bro | 1 + > 5 files changed, 13 insertions(+), 3 deletions(-) > > diff --git a/CHANGES b/CHANGES > index af31bdea0..15184aa4a 100644 > --- a/CHANGES > +++ b/CHANGES > @@ -1,4 +1,8 @@ > > +2.5-947 | 2018-08-30 16:05:36 -0500 > + > + * Allow loading policy/protocols/smb once again (Jon Siwek, > Corelight) > + > 2.5-946 | 2018-08-30 09:51:16 -0500 > > * Update NEWS with more info about runtime options (Daniel Thayer) > diff --git a/NEWS b/NEWS > index 0af51ef60..86839427b 100644 > --- a/NEWS > +++ b/NEWS > @@ -267,8 +267,12 @@ New Functionality > > - Added new NFS events: nfs_proc_symlink, nfs_proc_link, > nfs_proc_sattr. > > -- The SMB scripts in policy/protocols/smb are now moved into > base/protocols/smb > - and loaded/enabled by default. > +- The SMB scripts in policy/protocols/smb are now moved into > + base/protocols/smb and loaded/enabled by default. If you > previously > + loaded these scripts from their policy/ location (in local.bro or > + other custom scripts) you may now remove/change those although they > + should still work since policy/protocols/smb is simply a > placeholder > + script that redirects to the new base/ location. > > - Added new SMB events: smb1_transaction_secondary_request, > smb1_transaction2_secondary_request, smb1_transaction_response. > diff --git a/VERSION b/VERSION > index d522ba4d6..ecd34e707 100644 > --- a/VERSION > +++ b/VERSION > @@ -1 +1 @@ > -2.5-946 > +2.5-947 > diff --git a/scripts/policy/protocols/smb/__load__.bro > b/scripts/policy/protocols/smb/__load__.bro > new file mode 100644 > index 000000000..8fd733d38 > --- /dev/null > +++ b/scripts/policy/protocols/smb/__load__.bro > @@ -0,0 +1 @@ > + at load base/protocols/smb > diff --git a/scripts/test-all-policy.bro b/scripts/test-all-policy.bro > index 11824c2c6..d31da6573 100644 > --- a/scripts/test-all-policy.bro > +++ b/scripts/test-all-policy.bro > @@ -82,6 +82,7 @@ > @load protocols/modbus/track-memmap.bro > @load protocols/mysql/software.bro > @load protocols/rdp/indicate_ssl.bro > + at load protocols/smb/__load__.bro > @load protocols/smb/log-cmds.bro > @load protocols/smtp/blocklists.bro > @load protocols/smtp/detect-suspicious-orig.bro > > > > _______________________________________________ > bro-commits mailing list > bro-commits at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-commits From jsiwek at corelight.com Fri Aug 31 07:35:08 2018 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 31 Aug 2018 09:35:08 -0500 Subject: [Bro-Dev] [Bro-Commits] [git/bro] master: Allow loading policy/protocols/smb once again (57a505b0e) In-Reply-To: <1B9234BE-06D3-4E97-98B2-ACB9383A5342@icir.org> References: <201808302108.w7UL8ONm018535@bro-ids.icir.org> <1B9234BE-06D3-4E97-98B2-ACB9383A5342@icir.org> Message-ID: On Thu, Aug 30, 2018 at 5:06 PM Johanna Amann wrote: > > To pick up the idea that you mentioned before - do we also want to make > the new policy/protocols/smb/__load__.bro trigger a reporter warning > that it is deprecated? Sounds right -- unlikely it will ever be used in the future and should be removed (I don't see other policy/protocols/*/__load__.bro scripts, so I think that's a general convention anyway that got broke in this case since the intent was to eventually put it in base/). Other problem is that a simple reporter warning via a call in bro_init() doesn't give details on where the deprecated script was loaded from, so I've added a @deprecated directive: https://bro-tracker.atlassian.net/browse/BIT-1980 - Jon