From jsiwek at corelight.com Fri Feb 1 08:16:23 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 1 Feb 2019 10:16:23 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> Message-ID: On Thu, Jan 31, 2019 at 6:29 PM Vern Paxson wrote: > > * user doesn't care about parameter 'a', so they shouldn't have to list it > > * it makes it easier for to deprecate/change event parameters > > This seems like a pretty niche pair of benefits. Is there a compelling > use-case that's motivating this change? The compelling use-case I'd say is the ability to change/deprecate event parameters without suddenly breaking people's code since that has come up many times already. I briefly skimmed NEWS for just the last 2.6 release and count 5 times we broke an event signature where this patch would have helped. I think there's also some other higher-profile changes to event args we haven't moved forward with because we didn't want to break user code that this would help with. Old example from unresolved ticket: https://bro-tracker.atlassian.net/browse/BIT-1431 > One thing I initially wondered was whether this was going to tie our hands > in the future if we want to introduce C++-style overloading. However, it > looks like you've implemented this based on matching the names in the > declaration rather than the types, so that should be okay. Also this change only effects events and hooks, not functions. The semantics are different enough that maybe we would only want overloading for functions anyway. That is, functions have a single, fixed implementation/body that is defined by the *author*, so you may want to re-use the same name for something implemented in different ways. The *user* is the one that calls the function. Hooks and events have multiple implementations/bodies that are defined by the *user*. The *author* is generally the one the generates (calls) the event/hook. So if the event/hook name were overloaded, it's a bit confusing -- the user now has to decide between different signatures to handle, each containing different data sets and maybe neither contains the set they want (so now they handle two events of the same name instead of one). Really, an event is a unique name with some amount of data (parameters) associated with it and may always be generated with the full data set -- the user then chooses which data they are interested in by defining that explicitly in the handler's parameter list. - Jon From robin at corelight.com Fri Feb 1 08:24:22 2019 From: robin at corelight.com (Robin Sommer) Date: Fri, 1 Feb 2019 08:24:22 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> Message-ID: <20190201162422.GE20868@corelight.com> On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote: > > global my_event: event(a: count, b: string); > > event my_event(b: string) > > { print "my_event", b; } > Is there a compelling use-case that's motivating this change? I'm sure the main use case is changing an existing event's parameters without breaking existing scripts -- someting we've been increasingly running into as a major challenge. It's a nice a idea to relax parameter passing to work by name, and allow subsets. However, I can't quite get myself to really like it in this form, because it *looks* like an error to not have matching argument lists. Is there some syntax that would make it more clear what's going on? Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Fri Feb 1 09:17:09 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 1 Feb 2019 11:17:09 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190201162422.GE20868@corelight.com> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> Message-ID: On Fri, Feb 1, 2019 at 10:24 AM Robin Sommer wrote: > On Thu, Jan 31, 2019 at 16:29 -0800, Vern Paxson wrote: > > > > global my_event: event(a: count, b: string); > > > event my_event(b: string) > > > { print "my_event", b; } > > it *looks* like an error to not have matching > argument lists. Is there some syntax that would make it more clear > what's going on? Not sure. If the syntax were different, that introduces a "one more thing to remember" issue, so I might prefer consistency with other function-like constructs. Any other language we know that has multi-body functions we can reference for ideas? Did it look like an error in the sense of the user making a mistake or in the sense of traditional way functions in other languages like C/C++ require matching signatures? In the former, I think the semantics/intentions are actually clearer than before: the user didn't list a parameter because they don't care about it, so why make them. I know what event they want because they use unique names and the parameters they listed do map in a valid way. On the traditional side of things, overloading seems it's maybe a legit reason for requiring matching signatures, but I also explained why I think overloading wouldn't make sense in the context of events. - Jon From seth at corelight.com Fri Feb 1 09:30:00 2019 From: seth at corelight.com (Seth Hall) Date: Fri, 01 Feb 2019 12:30:00 -0500 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190201162422.GE20868@corelight.com> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> Message-ID: <0C2EC874-355E-4FA2-85B8-4B74D0E46ED1@corelight.com> On 1 Feb 2019, at 11:24, Robin Sommer wrote: > It's a nice a idea to relax parameter passing to work by name, and > allow subsets. However, I can't quite get myself to really like it in > this form, because it *looks* like an error to not have matching > argument lists. Is there some syntax that would make it more clear > what's going on? I think the change to using names does make things a bit more confusing for users, but it opens the door for us to greatly improve reliability of scripts in the long term and generally it feels like a nice way for analyzer authors to deprecate functionality without needing to create all new events. In my opinion even though there are hairy side effects to this I think it's a net positive. It would be great to get case sensitive versions of dns events and the http header event. That has been a very long standing deficit. I guess if there is some more obvious way to do it could make sense, but I haven't been able to come up with anything after thinking about this for quite a while. .Seth -- Seth Hall * Corelight, Inc * www.corelight.com From jsiwek at corelight.com Fri Feb 1 09:45:14 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 1 Feb 2019 11:45:14 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <0C2EC874-355E-4FA2-85B8-4B74D0E46ED1@corelight.com> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <0C2EC874-355E-4FA2-85B8-4B74D0E46ED1@corelight.com> Message-ID: On Fri, Feb 1, 2019 at 11:36 AM Seth Hall wrote: > On 1 Feb 2019, at 11:24, Robin Sommer wrote: > I think the change to using names does make things a bit more confusing > for users Note that the patch makes matching based on parameter name optional. One can still pick their own parameter names as long as they strictly follow the declaration's parameter order. i.e. there's no impact to people that aren't paying attention to this change, they see no difference in behavior. If a user finds this change confusing it's because they read the docs/example/release-notes and didn't understand them, so it's on us to improve them. - Jon From vern at corelight.com Fri Feb 1 10:59:01 2019 From: vern at corelight.com (Vern Paxson) Date: Fri, 01 Feb 2019 10:59:01 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: (Fri, 01 Feb 2019 10:16:23 CST). Message-ID: <20190201185901.8BFB0741279@rock.ICSI.Berkeley.EDU> > The compelling use-case I'd say is the ability to change/deprecate > event parameters without suddenly breaking people's code since that > has come up many times already. I see how it allows adding new parameters. I don't see how it helps with deprecating existing parameters (which seems would be better served with some sort of &deprecated attribute), and I don't see how it helps with changing the semantics of existing event parameters. > Also this change only effects events and hooks, not functions. The > semantics are different enough that maybe we would only want > overloading for functions anyway. It actually makes sense to me to support overloading for events. Then for example you could have two event signatures depending on what information the handler was going to leverage, which would allow the event engine to offload work if there isn't a handler for a signature that requires extra computation. > Hooks and events have multiple implementations/bodies that are defined > by the *user*. The *author* is generally the one the generates > (calls) the event/hook. The big exception being the event engine (if I follow what you mean by user/author). > So if the event/hook name were overloaded, it's a bit confusing -- the > user now has to decide between different signatures to handle, each > containing different data sets and maybe neither contains the set they > want (so now they handle two events of the same name instead of one). Not really seeing this. I'm picturing that a common idiom will be a lightweight version of an event and a heavyweight version, or maybe a spectrum from light-to-heavy. > Really, an event is a unique name with some amount of data > (parameters) associated with it and may always be generated with the > full data set -- the user then chooses which data they are interested > in by defining that explicitly in the handler's parameter list. Yeah, I agree that this too would allow (most of) the sort of offloading I sketch above. Vern From jsiwek at corelight.com Fri Feb 1 11:47:12 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Fri, 1 Feb 2019 13:47:12 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190201185901.8BFB0741279@rock.ICSI.Berkeley.EDU> References: <20190201185901.8BFB0741279@rock.ICSI.Berkeley.EDU> Message-ID: On Fri, Feb 1, 2019 at 12:59 PM Vern Paxson wrote: > I don't see how it helps with > deprecating existing parameters (which seems would be better served with > some sort of &deprecated attribute), Support for &deprecated in parameters is part of the changes. But if we don't allow the user to immediately remove the field, they are then stuck doing 2 changes: Step 1: we mark a field &deprecated Step 2: the user sees that, so they remove uses of that parameter from their body Step 3: we actually remove the &deprecated field Step 4: if the user was forced to still have the &deprecated param in their handler's param list they now have to do a second change to remove it instead of just removing it right away With the proposed patch, we get rid of the need for Step 4 and decrease burden on users. > and I don't see how it helps with > changing the semantics of existing event parameters. Step 1: we mark old field &deprecated and introduce a new parameter Step 2: the users sees that.. etc, etc. same as above. > It actually makes sense to me to support overloading for events. Then for > example you could have two event signatures depending on what information > the handler was going to leverage, which would allow the event engine to > offload work if there isn't a handler for a signature that requires extra > computation. I think the same kind of offloading is possible with the "parameter subset" approach. We know exactly what parameters are being consumed, so we might have optimizations that don't produce a parameter if no one consumes it. And if no one consumes any parameters we also don't generate the call. If you have two different event signatures, we just get the same type of optimization we currently do, which only optimize out the entire call if there's no handlers, but doesn't know if individual parameters are being consumed or not. e.g: http_request(a, b, c) http_request(d, e, f) If someone only consumes 'a' and 'e', you still have to produce both function calls in their entirety (and also the other unused params), but: http_request(a, b, c, d, e, f) You can potentially not do any work generating the unused parameters and only have to do the one function call with 'a' and 'e'. Technically, we can still require a matching signature and do such an optimization by walking the AST and finding local parameter usages. I guess you have to do that ultimately, but it's an easy head start at implementing such optimizations as a test/idea if we can simply see someone isn't using a parameter because it's not in their handler param list. - Jon From robin at corelight.com Tue Feb 5 17:40:10 2019 From: robin at corelight.com (Robin Sommer) Date: Tue, 5 Feb 2019 17:40:10 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> Message-ID: <20190206014010.GH92446@corelight.com> On Fri, Feb 01, 2019 at 11:17 -0600, Jonathan Siwek wrote: > > > > global my_event: event(a: count, b: string); > > > > event my_event(b: string) > > > > { print "my_event", b; } > Did it look like an error in the sense of the user making a mistake or > in the sense of traditional way functions in other languages like > C/C++ require matching signatures? It's probably the latter, but I'm not sure that helps: when I see the above code, I wonder: "Heck, why is that working ... Or, wait, maybe it isn't?". It makes it confusing to me to read the code. If at least the prototype told me somehow "it's ok if parameters are left off", then I'd have a clue that everything is fine. The following would be even worst in terms of confusion: global my_event: event(a: string, b: string); event my_event(b: string) Now I need to know if the language goes by order of parameters or by parameter name. I do see the appeal of making things just work when event handlers change, but is there really no different way to support that? (I don't have an opinion on the event overloading discussion yet, need to digest that more) Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From ghada.elbez at kit.edu Wed Feb 6 01:23:56 2019 From: ghada.elbez at kit.edu (Elbez, Ghada (IAI)) Date: Wed, 6 Feb 2019 09:23:56 +0000 Subject: [Zeek-Dev] Hi + LL Analyzer Message-ID: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> Hi everyone, I am researcher in KIT where I?m also doing my PhD. I had the chance to meet some of you in the last (well first) Bro Europe workshop. My research work is focused on the cyber-physical security of communication networks of substations based on IEC 61850. Thus, I was wondering if I could use Zeek for network monitoring of some layer 2 protocols (GOOSE and SV). I already quickly discussed the topic with Jan. However, I still would like to know more details what has been already done and what the current state is. To start also gathering some ideas about the topic. Best, Ghada -------------------------------------------------------- Karlsruhe Institute of Technology (KIT) Institute for Automation and Applied Informatics (IAI) M.Sc. Ghada Elbez PhD Student Hermann-von-Helmholtz-Platz 1 Building 445, Room 218 76344 Eggenstein-Leopoldshafen, Germany Phone: +49 721 608-2 8429 E-mail: ghada.elbez?kit.edu -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190206/82a51d17/attachment.html From jsiwek at corelight.com Wed Feb 6 07:48:27 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 6 Feb 2019 09:48:27 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190206014010.GH92446@corelight.com> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <20190206014010.GH92446@corelight.com> Message-ID: On Tue, Feb 5, 2019 at 7:40 PM Robin Sommer wrote: > The following would be even worst in terms of confusion: > > global my_event: event(a: string, b: string); > event my_event(b: string) > > Now I need to know if the language goes by order of parameters or by > parameter name. The issue may be exaggerated because we're using contrived parameter names. If we make it a bit more real (still "shortened" for sake of example): http_request: event(method: string, version: string); event http_request(version: string) { ... } The "parameter order vs. name" issue doesn't even cross my mind here because the intent is clear -- I understand the meaning of the handler's parameter because the original author of the event chose a meaningful and useful name for it (which will be the common case). That's all the reader cares about -- understanding the meaning of any given parameter. > I do see the appeal of making things just work when event handlers > change, but is there really no different way to support that? If the goal is to avoid breaking user-code, I don't think we're in a position to do anything else. For example, we have: global my_event: event(a: string); And we want to add a parameter: global my_event: event(a: string, b: string); The user has already written their handler as: event my_event(a: string) { ... } We can't magically change that user's code. We have to, somehow, let that work as it is, without user intervention. - Jon From seth at corelight.com Wed Feb 6 08:51:40 2019 From: seth at corelight.com (Seth Hall) Date: Wed, 06 Feb 2019 11:51:40 -0500 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <20190206014010.GH92446@corelight.com> Message-ID: <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> On 6 Feb 2019, at 10:48, Jon Siwek wrote: > We can't magically change that user's code. We have to, somehow, let > that work as it is, without user intervention. Yeah, I think that's right. I was trying to come up with some proposed model for indicating that you're specifying named parameters but that would still force existing code to be updated so that it could keep using the existing model. One thing I've been thinking about a little bit is how much we're concerning ourselves with maintaining perfect backward compatibility and if there is some benefit to breaking a bit of backward compatibility for something truly nicer? Like, should we have some specialized syntax for specifying named parameter use? Should we have something like anonymous records where you specify a variant of record syntax for named parameters? (ruby comes to mind for this, but I've seen people do similar things in c too) I'm just wondering if we should do a one-time source compatibility break to could get some tangible benefit in usability or understanding. I think Robin's concern is about backing us into a corner with a syntax that makes code confusing. Is that correct Robin? .Seth -- Seth Hall * Corelight, Inc * www.corelight.com From vlad at es.net Wed Feb 6 11:29:37 2019 From: vlad at es.net (Vlad Grigorescu) Date: Wed, 6 Feb 2019 19:29:37 +0000 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <20190206014010.GH92446@corelight.com> <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> Message-ID: I think compatibility is a growing issue with scripts being released as plugins. I'm already seeing some code shift to: > @if (Version ...) > new event > @else > old event I _think_ I like Seth's idea of records, but I'm still thinking it through. It would formalize a growing trend towards moving more parameters into records anyway. If we're worried about backwards compatibility, then maybe we have a built in version number in each record. Whenever fields are added/removed, or there are more subtle contextual changes, the version number could increase. As a concrete example, the is_server parameter of the ssh_capabilities event was being set backwards (effectively as "is_client.") I introduced a change[1] where the is_server parameter was corrected, but now how to interpret the value depended on the Bro version. This is a very subtle case, where no field was added or deleted, but users were still expected to change their code. --Vlad [1] - On Wed, Feb 6, 2019 at 5:10 PM Seth Hall wrote: > > > On 6 Feb 2019, at 10:48, Jon Siwek wrote: > > > We can't magically change that user's code. We have to, somehow, let > > that work as it is, without user intervention. > > Yeah, I think that's right. I was trying to come up with some proposed > model for indicating that you're specifying named parameters but that > would still force existing code to be updated so that it could keep > using the existing model. > > One thing I've been thinking about a little bit is how much we're > concerning ourselves with maintaining perfect backward compatibility and > if there is some benefit to breaking a bit of backward compatibility for > something truly nicer? Like, should we have some specialized syntax for > specifying named parameter use? Should we have something like anonymous > records where you specify a variant of record syntax for named > parameters? (ruby comes to mind for this, but I've seen people do > similar things in c too) > > I'm just wondering if we should do a one-time source compatibility break > to could get some tangible benefit in usability or understanding. I > think Robin's concern is about backing us into a corner with a syntax > that makes code confusing. Is that correct Robin? > > .Seth > > -- > Seth Hall * Corelight, Inc * www.corelight.com > _______________________________________________ > zeek-dev mailing list > zeek-dev at zeek.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/zeek-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190206/c3897d9e/attachment-0001.html From jsiwek at corelight.com Wed Feb 6 11:52:26 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 6 Feb 2019 13:52:26 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <20190206014010.GH92446@corelight.com> <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> Message-ID: On Wed, Feb 6, 2019 at 10:51 AM Seth Hall wrote: > One thing I've been thinking about a little bit is how much we're > concerning ourselves with maintaining perfect backward compatibility and > if there is some benefit to breaking a bit of backward compatibility for > something truly nicer? Sure, I wouldn't argue against taking that approach if there's a good suggestion. > Like, should we have some specialized syntax for > specifying named parameter use? Should we have something like anonymous > records where you specify a variant of record syntax for named > parameters? I think a user would prefer to always do whatever is most robust and won't break in the face of upstream changes. If I'm going to use the alternative syntax everywhere because it's better, why have an alternative in the first place. - Jon From jsiwek at corelight.com Wed Feb 6 12:00:38 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 6 Feb 2019 14:00:38 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <20190206014010.GH92446@corelight.com> <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> Message-ID: On Wed, Feb 6, 2019 at 1:30 PM Vlad Grigorescu wrote: > I _think_ I like Seth's idea of records, but I'm still thinking it through. It would formalize a growing trend towards moving more parameters into records anyway. If we're worried about backwards compatibility, then maybe we have a built in version number in each record. Whenever fields are added/removed, or there are more subtle contextual changes, the version number could increase. Explicit versioning is a neat idea to maybe try expanding on. I'm not quite sure how it would look for the user to, when they write their code, make it explicit that they expect the semantics of the record to match version XYZ. But yeah, on our end, we could ensure that if we see someone wanted XYZ, we send them that version of the record, which matches their semantic expectations. > As a concrete example, the is_server parameter of the ssh_capabilities event was being set backwards (effectively as "is_client.") In this particular example, the way I would have wanted to solve it with my proposed patch would be: deprecate "is_server", document it as actually meaning "is client" (don't change semantics of that parameter to avoid breaking user code that is already doing the inversion), but do introduce a new "is_client" which actually means "is client", then later remove "is_server". - Jon From jan.grashoefer at gmail.com Thu Feb 7 02:26:52 2019 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Thu, 7 Feb 2019 11:26:52 +0100 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: References: <20190201002907.0699B74B264@rock.ICSI.Berkeley.EDU> <20190201162422.GE20868@corelight.com> <20190206014010.GH92446@corelight.com> <60566DF7-B5D1-485B-9260-A36A37A88911@corelight.com> Message-ID: <9f2b44c5-5071-83c7-8027-477cecc868a6@gmail.com> On 06/02/2019 21:00, Jon Siwek wrote: > On Wed, Feb 6, 2019 at 1:30 PM Vlad Grigorescu wrote: > >> I _think_ I like Seth's idea of records, but I'm still thinking it through. It would formalize a growing trend towards moving more parameters into records anyway. If we're worried about backwards compatibility, then maybe we have a built in version number in each record. Whenever fields are added/removed, or there are more subtle contextual changes, the version number could increase. > > Explicit versioning is a neat idea to maybe try expanding on. I'm not > quite sure how it would look for the user to, when they write their > code, make it explicit that they expect the semantics of the record to > match version XYZ. If I get that right, this would mean that every record type comes with an individual version number, independent from the package or zeek version. While I can see that this allows to achieve backwards compatibility I am quite sure it will end in a complete mess. I expect the resulting code to be harder to write and to read, assuming that most of the records come with versioning constraints. Actually, I like Jon's solution most. From a coder's perspective, the only thing I have to keep in mind is that parameters are matched by name. If I want to rename I can fallback to matching by order. That said, I would guess renaming a parameter is not a critical use case. I cannot imagine a situation where I rely on that feature. For me, renaming parameters makes code harder to read if I am familiar with the original definition. One could even think about introducing syntactically explicit renaming of parameters (which would probably break backwards compatibility again). On 06/02/2019 02:40, Robin Sommer wrote: > The following would be even worst in terms of confusion: > > global my_event: event(a: string, b: string); > event my_event(b: string) > > Now I need to know if the language goes by order of parameters or by > parameter name. I can see that this might be confusing at first but I would argue that this is a one-time confusion. Versioned records in contrast would require me to think about their history every time I try to use a type. Jan From jan.grashoefer at gmail.com Thu Feb 7 02:32:08 2019 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Thu, 7 Feb 2019 11:32:08 +0100 Subject: [Zeek-Dev] Hi + LL Analyzer In-Reply-To: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> References: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> Message-ID: <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> To add a bit more context: The idea is to implement a plugin interface for low-level analyzers (see https://github.com/zeek/zeek/issues/248) and collect requirements on the list. Some first thoughts and questions: - What would be the lowest layer to built up on or should everything be pluggable down to the packet source? - What about the concept of connections? For some LL protocols the concept might be counterintuitive. - The interface should support to pass payload to other analyzers. Does it make sense to come up with a generalized DPD-mechanism? Jan From vern at corelight.com Mon Feb 18 16:32:29 2019 From: vern at corelight.com (Vern Paxson) Date: Mon, 18 Feb 2019 16:32:29 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <9f2b44c5-5071-83c7-8027-477cecc868a6@gmail.com> (Thu, 07 Feb 2019 11:26:52 +0100). Message-ID: <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> This thread got backburnered for me due to other $dayjob stuff, but returning to it now, one thing I wondered would be what if instead of just allowing missing parameters as a way to support changing event interfaces, we introduced an explicit notion of deprecation, as follows. Suppose "event foo(a: string)" is an event to which we want to add a second parameter, b: string. We could express this in event.bif as: event foo%(a: string%) &deprecated; event foo%(a: string, b: string%); This would (1) allow the developer who's changing the API to decide up front whether the change can be backward-compatible by leaving out parameters, (2) put the maintenance burden just on the developer, (3) allow users to (automatically) look for the uses of the deprecated API if they wish, (4) avoid certain forms of user errors, and (5) allow us to consider other API changes such as removing parameters. It's not as easy to implement ... but strikes me as cleaner than where this discussion has wound up going. Vern From jsiwek at corelight.com Tue Feb 19 06:28:26 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Tue, 19 Feb 2019 08:28:26 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> References: <9f2b44c5-5071-83c7-8027-477cecc868a6@gmail.com> <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> Message-ID: On Mon, Feb 18, 2019 at 6:46 PM Vern Paxson wrote: > Suppose "event foo(a: string)" is an event to which we want to add > a second parameter, b: string. We could express this in event.bif as: > > event foo%(a: string%) &deprecated; > event foo%(a: string, b: string%); Nice idea. I also think that would work in this "want to add a new param" case. > (4) avoid certain forms of user errors Which particular user errors were a concern? And you mean in contrast w/ my initial proposal? I was still thinking name-based matching most clearly expressed the users intent, so I doubt that results in increased errors. > and (5) allow us to consider other > API changes such as removing parameters. Yeah, removing should work following this scheme. What about changing semantics of a parameter though? Take earlier example Vlad gave, I think the process is: Step 1: event foo%(is_server: bool%) &deprecated; event foo%(is_server: bool, is_client: bool%); This is ok so far: no code will break and users can update to the new handler with extra parameter. It is a bit odd to have the "is_server" parameter available for use when the actual intent is to later get rid of it. Step 2: event foo%(is_server: bool, is_client: bool%) &deprecated; event foo%(is_client: bool%); Now this is really awkward: users have to update their code once again (in my proposal, they update their code just once). The other problem is that if a user is skipping a version, they may end up with a handler that treats "is_server" in the original way, but the meaning has been changed and we only match events based on type + number of parameters. With name-based parameter matching, we can catch this scenario as well. > It's not as easy to implement ... but strikes me as cleaner than where > this discussion has wound up going. I still prefer my original proposal from everything I've seen so far, I think it best fit the notion of what event handlers actually are (but maybe not how they've traditionally been thought about): we pick points in time as an "event" and associate various bits of data with them, then users declare their interest in some subset of that data. Most natural way for them to do that seems to be by naming each bit of data. Can you maybe break down what you mean by "clean" further so we can better compare/contrast solutions ? I don't recall seeing anything that was a showstopper for my original patch (but been a while). - Jon From vern at corelight.com Tue Feb 19 11:27:32 2019 From: vern at corelight.com (Vern Paxson) Date: Tue, 19 Feb 2019 11:27:32 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: (Tue, 19 Feb 2019 08:28:26 CST). Message-ID: <20190219192732.DC65975ADA8@rock.ICSI.Berkeley.EDU> > > (4) avoid certain forms of user errors > > Which particular user errors were a concern? The main one being when there's an API change that's *not* backward compatible, and the user doesn't update their scripts to be in conformance with it as is required. Clearly something we'll in general strive to avoid. > I was still thinking > name-based matching most clearly expressed the users intent, so I > doubt that results in increased errors. Agreed for those instances where it's compatible. > Yeah, removing should work following this scheme. What about changing > semantics of a parameter though? That's indeed trickier. Here's a thought for the example you provide: event foo%(is_server: bool%) &deprecated; event foo%(is_client: bool%); This would mean "if the handler is defined using the name is_server as the parameter, that's the deprecated version". Any other declaration (that's for a parameter of type bool, of course) would refer to the new version. By the way, I didn't follow whether in Vlad's example, the semantics of the parameter changed, or if his fix was just to give it the correct name to match its actual semantics. For the latter, existing handlers are presumably flat-out broken, and there's no benefit in trying to continue to support them. One concern I have with leaning on is-the-name-a-match is not-too-implausible user coding along the lines of: event foo(is_server_orig: bool) { ... local is_server = my_twisty_logic(is_server_orig, x, y, z); ... } i.e., the parameter being renamed by the user anyway because they want to use the original name for a local variable. These users will be bitten by changes in parameter semantics regardless of which approach we use; name-based matching isn't a panacea. > The other problem is that if a user is skipping a version, they may > end up with a handler that treats "is_server" in the original way, but > the meaning has been changed and we only match events based on type + > number of parameters. With name-based parameter matching, we can > catch this scenario as well. With the tweak I outline above, this would only bite them once the &deprecated version is removed. That could span several releases, depending on the particular situation. I don't have a lot of sympathy for users who upgrade across a number of releases, for which the release notes flag backward compatibility issues, and they don't take heed to address those. > Can you maybe break down what you mean by "clean" further so we can > better compare/contrast solutions ? I don't recall seeing anything > that was a showstopper for my original patch (but been a while). I find an approach that changes the fundamental type-checking used for event handlers to be more heavy-handed than one that provides control to the developer to explicitly decide when they've made a change for which it makes sense to support a deprecated version. If Zeek had originally used name-based parameters, then I'd be fine with this being the solution. However, switching from positional to name-based strikes me as a conceptually deep change for a relatively modest problem. Vern From robin at corelight.com Tue Feb 19 20:51:17 2019 From: robin at corelight.com (Robin Sommer) Date: Tue, 19 Feb 2019 20:51:17 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> References: <9f2b44c5-5071-83c7-8027-477cecc868a6@gmail.com> <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> Message-ID: <20190220045117.GE87493@corelight.com> On Mon, Feb 18, 2019 at 16:32 -0800, Vern Paxson wrote: > event foo%(a: string%) &deprecated; > event foo%(a: string, b: string%); I like this. It addresses my main concern of making it clear what's happening. If I see an implementation of the event and look up the prototype, I'll find both and can understand what's going on. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From robin at corelight.com Tue Feb 19 20:54:58 2019 From: robin at corelight.com (Robin Sommer) Date: Tue, 19 Feb 2019 20:54:58 -0800 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: References: <9f2b44c5-5071-83c7-8027-477cecc868a6@gmail.com> <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> Message-ID: <20190220045458.GF87493@corelight.com> On Tue, Feb 19, 2019 at 08:28 -0600, Jonathan Siwek wrote: > we pick points in time as an "event" and associate various bits of > data with them, then users declare their interest in some subset of > that data. Most natural way for them to do that seems to be by naming > each bit of data. I agree with all of that in principle, but for me it would need a different syntax to become practical. The current syntax for events implies certain semantics just by being similar to so many other languages that don't do named-based matching for parameters. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jsiwek at corelight.com Wed Feb 20 09:56:20 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 20 Feb 2019 11:56:20 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190219192732.DC65975ADA8@rock.ICSI.Berkeley.EDU> References: <20190219192732.DC65975ADA8@rock.ICSI.Berkeley.EDU> Message-ID: On Tue, Feb 19, 2019 at 1:27 PM Vern Paxson wrote: > > Which particular user errors were a concern? > > The main one being when there's an API change that's *not* backward > compatible, and the user doesn't update their scripts to be in conformance > with it as is required. Clearly something we'll in general strive to avoid. That's more of a developer error than a user error -- we can make changes that are not backward compatible regardless of what mechanism we go with, and if we do, that's on us (and we do want to always try to make changes that are backward compatible for at least one release cycle). > > What about changing > > semantics of a parameter though? > > That's indeed trickier. Here's a thought for the example you provide: > > event foo%(is_server: bool%) &deprecated; > event foo%(is_client: bool%); > > This would mean "if the handler is defined using the name is_server as the > parameter, that's the deprecated version". Any other declaration (that's > for a parameter of type bool, of course) would refer to the new version. Now we are back to matching parameters on name, so again feels more natural to me to instead just support that generally in the first place. Take another concrete example we've done in the past: change parameter order. event foo(a: bool, b: string); Changes to: event foo(b: string, a: bool); If we generally support matching params by name, then all existing event handlers continue working, no user action needed. Otherwise we need to &deprecate and force users to do work to update their code. > By the way, I didn't follow whether in Vlad's example, the semantics of > the parameter changed, or if his fix was just to give it the correct name > to match its actual semantics. For the latter, existing handlers are > presumably flat-out broken, and there's no benefit in trying to continue > to support them. In Vlad's example, the fix was to make "is_server" actually mean "is server", but people already wrote code based on the real meaning being "is client", so we broke code. (It's a useful example both to learn from and consider how we might do things differently if we had better deprecation procedures/mechanisms). > One concern I have with leaning on is-the-name-a-match is not-too-implausible > user coding along the lines of: > > event foo(is_server_orig: bool) > { > ... > local is_server = my_twisty_logic(is_server_orig, x, y, z); > ... > } > > i.e., the parameter being renamed by the user anyway because they want to > use the original name for a local variable. These users will be bitten > by changes in parameter semantics regardless of which approach we use; > name-based matching isn't a panacea. They wouldn't be bitten if we just don't ever change parameter semantics like that, but use a different deprecation process like I mentioned. E.g. we start with: event foo(is_server: bool); They use: event foo(is_server_orig: bool); We change to: event foo(is_server: bool &deprecated, is_client: bool); Now they get a deprecation warning because we still fall back to matching params by order+type if there's no valid name mappings. The case where they *could* get silently bitten by changes in parameter semantics is if we are actually swapping parameter order, but they happen to swap params of the same type and the user defined custom parameter names. The fix there is either we just don't ever do such ambiguous parameter order swapping or we say that the user is opting into that potential pain for the purely cosmetic reason of custom param names. (Or we rely purely on name-based param matching without falling back to traditional order+type matching. I added the fallback just because it seems generally worthwhile to still support that if people *really* want that, or if they are using custom names already). > > The other problem is that if a user is skipping a version, they may > > end up with a handler that treats "is_server" in the original way, but > > the meaning has been changed and we only match events based on type + > > number of parameters. With name-based parameter matching, we can > > catch this scenario as well. > > With the tweak I outline above, this would only bite them once the &deprecated > version is removed. That could span several releases, depending on the > particular situation. I don't have a lot of sympathy for users who upgrade > across a number of releases, for which the release notes flag backward > compatibility issues, and they don't take heed to address those. I don't sympathize much either, but still though -- if we've got a solution that additionally makes this a non-issue, then that's a Good. > I find an approach that changes the fundamental type-checking used for > event handlers to be more heavy-handed than one that provides control to > the developer to explicitly decide when they've made a change for which > it makes sense to support a deprecated version. There's two points here that I don't think were necessarily related, so taking them separately: * My proposed patch has a more fundamental change to how event handler parameters get type-checked. True, but not sure that is implicitly a Bad. It's maybe a Good if it fits the requirements better (and I still think it does :)) * The two proposals (yours vs. mine) have a difference in the amount of control they grant a developer. Need some clarification here. In my proposal, the dev has just as much control as they always had, plus they get to choose how to deprecate individual event parameters. So that's maybe more control than deprecating entire event signatures? Or what specifically was the difference in developer control that's a concern? Generally I was thinking the mission statement should first emphasize "don't annoy users", then "give developers enough options to make decisions that will annoy users the least". > If Zeek had originally used name-based parameters, then I'd be fine with > this being the solution. However, switching from positional to name-based > strikes me as a conceptually deep change for a relatively modest problem. Slight disagreement on scale of problem -- having solid deprecation procedures/mechanisms is something that's desirable for any software expecting a large user-base. Optimizing for least time-wastage and deprecation periods that give users flexibility in choosing when to address problems is a Good. - Jon From jsiwek at corelight.com Wed Feb 20 10:08:17 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Wed, 20 Feb 2019 12:08:17 -0600 Subject: [Zeek-Dev] support for event handlers using a subset of parameters In-Reply-To: <20190220045458.GF87493@corelight.com> References: <9f2b44c5-5071-83c7-8027-477cecc868a6@gmail.com> <20190219003229.878505C8C@rock.ICSI.Berkeley.EDU> <20190220045458.GF87493@corelight.com> Message-ID: On Tue, Feb 19, 2019 at 10:55 PM Robin Sommer wrote: > > On Tue, Feb 19, 2019 at 08:28 -0600, Jonathan Siwek wrote: > > > we pick points in time as an "event" and associate various bits of > > data with them, then users declare their interest in some subset of > > that data. Most natural way for them to do that seems to be by naming > > each bit of data. > > I agree with all of that in principle, but for me it would need a > different syntax to become practical. The current syntax for events > implies certain semantics just by being similar to so many other > languages that don't do named-based matching for parameters. Granted, but not sure how problematic that is in reality. E.g. you see this: event foo(version: string, method: string); event foo(method: string) { ... } Seems intuitive that one understands the "method" parameter means the same thing in both the definition and handler. This is a fundamental reason why, in human's general use of language, we name things in the first place: things that share a common name share a common meaning. Or what else is there to be confused about with name-based param matching ? A side point is that maybe some other languages are just following the crowd and actually would benefit from name-based param matching in some cases. Maybe they just haven't thought about it (I have, that's why I'm still trying to hang on to this :)). Doesn't mean we shouldn't evaluate Pros/Cons of breaking apart from the crowd. I haven't seen anything so far that was a convincing Con to me, but always possible I've missed some way name-based param matching could conflict with an existing or future feature of the language -- those would be particularly convincing Cons for me. - Jon From robin at corelight.com Mon Feb 25 17:36:46 2019 From: robin at corelight.com (Robin Sommer) Date: Mon, 25 Feb 2019 17:36:46 -0800 Subject: [Zeek-Dev] Hi + LL Analyzer In-Reply-To: <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> References: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> Message-ID: <20190226013646.GB33501@corelight.com> (I realized this slipped through the cracks, sorry for the late feedback, hope it still helps) On Thu, Feb 07, 2019 at 11:32 +0100, Jan Grash?fer wrote: > - What would be the lowest layer to built up on or should everything be > pluggable down to the packet source? I see three pieces here overall that I think can be tackled independently: (1) Link-layer: Currently hardcoded in Packet::ProcessLayer2() (2) IP-Layer: Currently hardcoded in NetSessions::NextPacket() (3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket(). Case (1) is all about skipping the header to get to IP. There's some redundancy across cases, though, and MPLS makes it all more messy. With (2), a plugin would be able to add support for non-IP protocols. However, due to Bro generally assuming that it is analyzing IP, the plugin would either need to take care of such packets completely (like ARP does), or eventually get to an IP packet that it can then feed back for further analysis (like if it some kind of a tunnel). Similar for (3): A plugin would be able to add support for further transport layer protocols, but it'd be mostly about stripping additional headers to eventually get to TCP/UDP/ICMP. There's also a more general version of (2) and (3) where we'd remove Bro's assumption of analyzing TCP/IP protocols. But that's a separate, large effort by itself. On a technical level, plugging in such low-level analyzers needs to be very efficient, in particular if we move the currently hardcoded cases into the plugins as well (as I think we should; similar to how application-layer analyzers have all moved into internal plugins). Then the lookup-the-analyzer-and-dispatch operation will happen multiple times for every packet. > - What about the concept of connections? For some LL protocols the > concept might be counterintuitive. Couple cases there: - If there's really no sense of a connection, then the plugin will need to take complete care of the packets, as the rest of Bro assumes connection-semantics. - If it's just the definition of what defines a connection that is different, then I think we could make that more flexible. I've been hoping for a while that we can make Bro's notion of connection IDs dynamic, so that it's not necessarily just the 5-tuple. There are use cases outside of new protocols for this, too. For example, one could include the VLAN ID to deal with overlapping IP ranges in independent VLANs. > - The interface should support to pass payload to other analyzers. Does > it make sense to come up with a generalized DPD-mechanism? Not quite sure what you're thinking here, but I believe that fully solving this would require addressing Bro's overall assumption of analyzing TCP/IP. For now, maybe the best way would be just having the analyzer call back into entry points corresponding to the various layers where analysis would then proceed as normal. I.e., some variation of: ProcessLinkLayer(...), ProcessIP(...), ProcessTransport(data), ProcessAppLayer(...). The caller would be responsible for providing all the right (meta-)data, like IP headers. Were you thinking something different / more general? Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jan.grashoefer at gmail.com Wed Feb 27 07:07:36 2019 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Wed, 27 Feb 2019 16:07:36 +0100 Subject: [Zeek-Dev] Hi + LL Analyzer In-Reply-To: <20190226013646.GB33501@corelight.com> References: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> <20190226013646.GB33501@corelight.com> Message-ID: On 26/02/2019 02:36, Robin Sommer wrote: > I see three pieces here overall that I think can be tackled > independently: > > (1) Link-layer: Currently hardcoded in Packet::ProcessLayer2() > > (2) IP-Layer: Currently hardcoded in NetSessions::NextPacket() > > (3) Transport-layer: Currently hardcoded in NetSessions::DoNextPacket(). At first glance it looks like IP-layer multiplexing is done in NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled in Manager::BuildInitialAnalyzerTree in context of initializing a connection. > Case (1) is all about skipping the header to get to IP. There's some > redundancy across cases, though, and MPLS makes it all more messy. One thing that comes to my mind here is whether it might be possible to pass information such as VLAN tags, MPLS labels or link layer addresses to upper layers in a generic way without hardcoding. However, that might be out of scope for now. > With (2), a plugin would be able to add support for non-IP protocols. > However, due to Bro generally assuming that it is analyzing IP, the > plugin would either need to take care of such packets completely (like > ARP does), or eventually get to an IP packet that it can then feed > back for further analysis (like if it some kind of a tunnel). The non-IP packet might also contain a Transport-layer PDU. I guess it should be possible to pass these on as well. > There's also a more general version of (2) and (3) where we'd remove > Bro's assumption of analyzing TCP/IP protocols. But that's a separate, > large effort by itself. That is the central point. So a first step would be to rely on TCP/IP in the "middle" of the stack but allow pluggable Link-layer protocols. Those might feed their data to the TCP/IP pipeline or handle them on their own. The next step would be the IP-layer. > On a technical level, plugging in such low-level analyzers needs to be > very efficient, in particular if we move the currently hardcoded cases > into the plugins as well (as I think we should; similar to how > application-layer analyzers have all moved into internal plugins). > Then the lookup-the-analyzer-and-dispatch operation will happen > multiple times for every packet. One question here would be whether it makes sense to assume that the set of LL-analyzers tash should be available is known at compile-time? >> - What about the concept of connections? For some LL protocols the >> concept might be counterintuitive. > > Couple cases there: > > - If there's really no sense of a connection, then the plugin will > need to take complete care of the packets, as the rest of Bro > assumes connection-semantics. Maybe there is another general abstraction that is worth to be supported as well. I was thinking of request-reply-pairs that can be correlated. However, I haven't put much thought into this, yet. > - If it's just the definition of what defines a connection that is > different, then I think we could make that more flexible. I've been > hoping for a while that we can make Bro's notion of connection IDs > dynamic, so that it's not necessarily just the 5-tuple. There are > use cases outside of new protocols for this, too. For example, one > could include the VLAN ID to deal with overlapping IP ranges in > independent VLANs. I think this would be part of the larger effort to re-think Zeek's notion of connections. This could be addressed together with implementing a flexible mechanism to make meta data like LL-addresses available in context of a connection. >> - The interface should support to pass payload to other analyzers. Does >> it make sense to come up with a generalized DPD-mechanism? > > Not quite sure what you're thinking here, but I believe that fully > solving this would require addressing Bro's overall assumption of > analyzing TCP/IP. For now, maybe the best way would be just having the > analyzer call back into entry points corresponding to the various > layers where analysis would then proceed as normal. I.e., some > variation of: ProcessLinkLayer(...), ProcessIP(...), > ProcessTransport(data), ProcessAppLayer(...). The caller would be > responsible for providing all the right (meta-)data, like IP headers. > Were you thinking something different / more general? While I haven't looked into it, I noticed that there are distinct PIA implementations for TCP and UDP. In case we allow to plug in new transport protocols, they might need their own PIA to support the analysis of known protocols like HTTP etc. However, if we keep a focus on TCP/IP as suggested that would be out of scope for now. Jan From robin at corelight.com Wed Feb 27 11:40:28 2019 From: robin at corelight.com (Robin Sommer) Date: Wed, 27 Feb 2019 11:40:28 -0800 Subject: [Zeek-Dev] Hi + LL Analyzer In-Reply-To: References: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> <20190226013646.GB33501@corelight.com> Message-ID: <20190227194028.GD54882@corelight.com> On Wed, Feb 27, 2019 at 16:07 +0100, Jan Grash?fer wrote: > At first glance it looks like IP-layer multiplexing is done in > NetSessions::{NextPacket, DoNextPacket} and the Transport-layer is tackled > in Manager::BuildInitialAnalyzerTree in context of initializing a > connection. Well, there, too. :) That's indeed doing the packet dispatching, while DoNextPacket() sets up state mgmt. It's all not quite clear cut, which is part of the problem. > That is the central point. So a first step would be to rely on TCP/IP in the > "middle" of the stack but allow pluggable Link-layer protocols. Those might > feed their data to the TCP/IP pipeline or handle them on their own. The next > step would be the IP-layer. Yeah, that sounds good to me. > One question here would be whether it makes sense to assume that the set of > LL-analyzers tash should be available is known at compile-time? The built-in ones can be known, but any added through dynamic plugins can't really. We'll know only at runtime what the final set is. But we could precompute a lookup table in advance at startup that maps link types to analyzers. > I think this would be part of the larger effort to re-think Zeek's notion of > connections. This could be addressed together with implementing a flexible > mechanism to make meta data like LL-addresses available in context of a > connection. Yep. > In case we allow to plug in new transport protocols, they might need > their own PIA to support the analysis of known protocols like HTTP > etc. Yeah, or a more generic PIA that provides its own hook for plugins. The main difference between TCP/UDP PIAs is packet vs stream semantics, iirc. That might generalize sufficiently, but not sure. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com From jan.grashoefer at gmail.com Thu Feb 28 02:35:01 2019 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Thu, 28 Feb 2019 11:35:01 +0100 Subject: [Zeek-Dev] Hi + LL Analyzer In-Reply-To: <20190227194028.GD54882@corelight.com> References: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> <20190226013646.GB33501@corelight.com> <20190227194028.GD54882@corelight.com> Message-ID: <55064c9c-552d-8825-854a-1a4f11abd401@gmail.com> On 27/02/2019 20:40, Robin Sommer wrote: >> One question here would be whether it makes sense to assume that the set of >> LL-analyzers tash should be available is known at compile-time? > > The built-in ones can be known, but any added through dynamic plugins > can't really. We'll know only at runtime what the final set is. But we > could precompute a lookup table in advance at startup that maps link > types to analyzers. The question here would be whether LL-analyzers have to be linked dynamically. Another option would be to require users to build Zeek if they need additional LL-analyzers. The analyzers would still be modular but using some meta programming one might be able to generate efficient dispatching code at compile-time. If the focus is on performance we could benchmark both approaches and decide based on the results. Jan From robin at corelight.com Thu Feb 28 07:55:47 2019 From: robin at corelight.com (Robin Sommer) Date: Thu, 28 Feb 2019 07:55:47 -0800 Subject: [Zeek-Dev] Hi + LL Analyzer In-Reply-To: <55064c9c-552d-8825-854a-1a4f11abd401@gmail.com> References: <31f3a79767fa47febc4a32c945593fb9@kit-msx-23.kit.edu> <32013523-6256-8153-0cc3-8be9d32312db@gmail.com> <20190226013646.GB33501@corelight.com> <20190227194028.GD54882@corelight.com> <55064c9c-552d-8825-854a-1a4f11abd401@gmail.com> Message-ID: <20190228155547.GG83625@corelight.com> On Thu, Feb 28, 2019 at 11:35 +0100, Jan Grash?fer wrote: > The question here would be whether LL-analyzers have to be linked > dynamically. Well, the point of the plugin API is being able to add new functionality externally through an independently compiled shared library. Excluding link-layer analyzers from that would feel like a gap to me. That said, we definitely need to benchmark performance to make sure it's feasible. My hunch is that a lookup table should be good enough, but we'll see. Robin -- Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com