From robin at corelight.com  Thu Nov  1 07:43:36 2018
From: robin at corelight.com (Robin Sommer)
Date: Thu, 1 Nov 2018 07:43:36 -0700
Subject: [Bro-Dev] Config Framework Feedback
In-Reply-To: <69986ffb-6abc-33fe-70a0-a523eeba7958@corelight.com>
References: <db64e61f5de340b28beb7eb91018493a@battelle.org>
	<69986ffb-6abc-33fe-70a0-a523eeba7958@corelight.com>
Message-ID: <20181101144336.GB39013@corelight.com>

The oberservations / thoughts in this thread seem worth a ticket I'd
say. We can refine this over time if the current semantics aren't
quite ideal yet.

Robin

On Tue, Oct 30, 2018 at 13:17 -0700, Christian Kreibich wrote:

> Hi folks,
> 
> I would agree that it takes a bit of experimentation to figure out 
> exactly when a change handler fires and how to reliably initialize or 
> update things based on an option's value.
> 
> Consider this:
> 
>    module Foo;
> 
>    export { option foo = F; }
> 
>    function foo_handler(ID: string, foo_new: bool): bool
>    {
>            print fmt("New foo: %s", foo_new);
> 
>            # Update stuff here based on foo's value
>            # ...
> 
>            return foo_new;
>    }
> 
>    event bro_init() {
>            Option::set_change_handler("Foo::foo", foo_handler);
>    }
> 
> ... foo_handler doesn't get called when you simply run the script 
> without redefing Config::config_files. When you do redef it, the handler 
> fires both when the config file sets foo to T, and when it sets it to F.
> 
> So you have to make sure that your initialization happens even when the 
> handler doesn't get called, and you cannot write your handler assuming 
> that the new value is actually different from the old one.
> 
> These arguably aren't bugs, but imo they do take getting used to.
> 
> Best,
> -C.
> _______________________________________________
> bro-dev mailing list
> bro-dev at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev


-- 
Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com

From christian at corelight.com  Thu Nov  1 11:58:47 2018
From: christian at corelight.com (Christian Kreibich)
Date: Thu, 1 Nov 2018 11:58:47 -0700
Subject: [Bro-Dev] Config Framework Feedback
In-Reply-To: <20181101144336.GB39013@corelight.com>
References: <db64e61f5de340b28beb7eb91018493a@battelle.org>
	<69986ffb-6abc-33fe-70a0-a523eeba7958@corelight.com>
	<20181101144336.GB39013@corelight.com>
Message-ID: <a7b77580-b98b-14b8-f5e6-9faf2bce7799@corelight.com>

On 11/1/18 7:43 AM, Robin Sommer wrote:
> The oberservations / thoughts in this thread seem worth a ticket I'd
> say. We can refine this over time if the current semantics aren't
> quite ideal yet.

Okay Robin, I've created https://github.com/bro/bro/issues/201 for this.

Thanks,
-C.

From vern at corelight.com  Sat Nov  3 12:27:13 2018
From: vern at corelight.com (Vern Paxson)
Date: Sat, 03 Nov 2018 12:27:13 -0700
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <CAMzgZ0JnV+3FuFxP1FE2SVwNdTWfPiyVd=N_CCptLR-B9gUTdg@mail.gmail.com>
	(Mon, 10 Sep 2018 17:01:00 CDT).
Message-ID: <201811031927.wA3JRDfI009508@fruitcake.ICSI.Berkeley.EDU>

Hmmmm I've looked into this and there are some subtle issues.

First, I tried to make this work using TypeType's like I had sketched, and
it turns out to be a mess.  Too many points where a decision has to be
made whether to access the base type (what the named type points to) rather
than the TypeType itself.

I then had an Aha and realized it can instead be done in the grammar, by
associating different semantics with resolving type names depending on the
context in which they appear.  I have this working.  It's pretty simple, too.

HOWEVER: running on the test suite points up an issue I hadn't anticipated.
We have attributes associated with named types that currently aren't expected
to propagate.  One example is from share/bro/base/init-bare.bro:

	## A connection's identifying 4-tuple of endpoints and ports.
	##    
	## .. note:: It's actually a 5-tuple: the transport-layer protocol is stored as
	##    part of the port values, `orig_p` and `resp_p`, and can be extracted from
	##    them with :bro:id:`get_port_transport_proto`.
	type conn_id: record {
		orig_h: addr;   ##< The originator's IP address.
		orig_p: port;   ##< The originator's port number.
		resp_h: addr;   ##< The responder's IP address.
		resp_p: port;   ##< The responder's port number.
	} &log;

So conn_id's have &log associated with them.  I'm not sure why this was
done (maybe a question for @Seth), since previously this was a no-op.
However, with my change/fix, this now means that any use of a conn_id
automatically inherits &log.  In principle, that's consistent with the
on-the-face-of-it semantics ... but it will likely lead to significant
unwanted effects if left unaddressed.

I have a couple of thoughts regarding this:

	(1) I can go through the existing scripts and remove such attributes
	    where they currently appear.  I believe that this shouldn't have
	    any effect because previously those weren't propagated anyway;
	    their presence seems to me more a bug than anything else, but
	    maybe I'm missing something.

	(2) This makes me wonder about adding an operator to *remove* an
	    attribute if present.  For example, you could imagine wanting
	    to now do something like:

		type my_conn_info: record {
			id: conn_id -&log;
			...
		};

	    as a way of specifying "if conn_id's have a &log attribute,
	    I don't want to inherit it".

Comments?

		Vern

From vlad at es.net  Sat Nov  3 14:00:36 2018
From: vlad at es.net (Vlad Grigorescu)
Date: Sat, 3 Nov 2018 21:00:36 +0000
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <201811031927.wA3JRDfI009508@fruitcake.ICSI.Berkeley.EDU>
References: <CAMzgZ0JnV+3FuFxP1FE2SVwNdTWfPiyVd=N_CCptLR-B9gUTdg@mail.gmail.com>
	<201811031927.wA3JRDfI009508@fruitcake.ICSI.Berkeley.EDU>
Message-ID: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>

To better understand the existing behavior, here's the commit that
introduced this (specifically with regards to conn_id):
https://github.com/bro/bro/commit/38a1aa5a346d10de32f9b40e0869cdb48a98974b

> The &log keyword now operates as discussed:
>
>     - When associated with individual record fields, it defines them
>       as being logged.
>
>     - When associated with a complete record type, it defines all fields
>       to be logged.
>
>     - When associated with a record extension, it defines all added
>       fields to be logged.
>
>     Note that for nested record types, the inner fields must likewise
>     be declared with &log. Consequently, conn_id is now declared with
>     &log in bro.init.
>
> I think the discussion this is referring to is here:
http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2011-March/001107.html

On Sat, Nov 3, 2018 at 7:34 PM Vern Paxson <vern at corelight.com> wrote:

>         (2) This makes me wonder about adding an operator to *remove* an
>             attribute if present.  For example, you could imagine wanting
>             to now do something like:
>
>                 type my_conn_info: record {
>                         id: conn_id -&log;
>                         ...
>                 };
>
>             as a way of specifying "if conn_id's have a &log attribute,
>             I don't want to inherit it".
>

I've found myself wishing to remove an attribute recently, so this train of
thought is relevant. I had imagined something slightly different, which was
to maintain &log as it currently exists, but to also be able to explicitly
set it to T or F, e.g.:

> id: conn_id &log=F;

That would allow me to also be able to use redefs to configure whether or
not I want to log something:

> const log_conn = T &redef;
> ...
> id: conn_id &log=log_conn;

I think that if we add something like this for &log, it might make sense to
add it for other keywords too.

  --Vlad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20181103/a6565227/attachment.html 

From vern at corelight.com  Sat Nov  3 14:14:16 2018
From: vern at corelight.com (Vern Paxson)
Date: Sat, 03 Nov 2018 14:14:16 -0700
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>
	(Sat, 03 Nov 2018 21:00:36 -0000).
Message-ID: <201811032114.wA3LEGEm019479@fruitcake.ICSI.Berkeley.EDU>

Thanks for the pointers & thoughts!  A quick question, more in a bit:

> To better understand the existing behavior, here's the commit that
> introduced this (specifically with regards to conn_id):
> https://github.com/bro/bro/commit/38a1aa5a346d10de32f9b40e0869cdb48a98974b
> ...
> >     Note that for nested record types, the inner fields must likewise
> >     be declared with &log. Consequently, conn_id is now declared with
> >     &log in bro.init.

Does your understanding of this accord with the current behavior when
running on testing/btest/scripts/base/frameworks/logging/attr.bro ?
The test suite result has it not logging Log$id, even though it's of
type conn_id, which has &log.  (For my new version, it does log it.)

		Vern

From vlad at es.net  Sat Nov  3 14:58:34 2018
From: vlad at es.net (Vlad Grigorescu)
Date: Sat, 3 Nov 2018 21:58:34 +0000
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <201811032114.wA3LEGEm019479@fruitcake.ICSI.Berkeley.EDU>
References: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>
	<201811032114.wA3LEGEm019479@fruitcake.ICSI.Berkeley.EDU>
Message-ID: <CAPqbkws7P1kfca1GDz6u9ueuCYUZ7ouQUyjpFiXarRKwih7R-Q@mail.gmail.com>

On Sat, Nov 3, 2018 at 9:14 PM Vern Paxson <vern at corelight.com> wrote:

> Thanks for the pointers & thoughts!  A quick question, more in a bit:
>
> > To better understand the existing behavior, here's the commit that
> > introduced this (specifically with regards to conn_id):
> >
> https://github.com/bro/bro/commit/38a1aa5a346d10de32f9b40e0869cdb48a98974b
> > ...
> > >     Note that for nested record types, the inner fields must likewise
> > >     be declared with &log. Consequently, conn_id is now declared with
> > >     &log in bro.init.
>
> Does your understanding of this accord with the current behavior when
> running on testing/btest/scripts/base/frameworks/logging/attr.bro ?
> The test suite result has it not logging Log$id, even though it's of
> type conn_id, which has &log.  (For my new version, it does log it.)
>

Hmm. I had to think about this for a bit, and I think it does agree with
the commit message. It's rather subtle, but because the message talks about
how the fields "must likewise be declared with &log," I can see how the
expectation would be that *both* the conn_id declaration in init-bare and
the usage in your record need to have the &log keyword to be logged.
However, before reading that commit message, that was not my expectation
for how Bro would behave.

I've been playing around with this a bit more, and I think that what's
described in the commit message is not the current behavior. Specifically,
the following seem to behave the same:

type conn_id: record {
>         orig_h: addr;
>         orig_p: port;
>         resp_h: addr;
>         resp_p: port;
> } &log;
>

type conn_id: record {
>         orig_h: addr &log;
>         orig_p: port &log;
>         resp_h: addr &log;
>         resp_p: port &log;
> };
>

This example demonstrates that all fields are still logged:
http://try.bro.org/#/trybro/saved/275829

In my mind, if the keyword is applied to a record, I would expect any new
fields added to that record to also be logged. However, if I use conn_id as
defined in init-bare (with &log applied to the record), and I redef conn_id
as follows, it will not log the new field:

redef record conn_id += {
>     nolog: bool &optional;
> }
>

I believe that applying &log to a record is just shorthand to applying it
individually to all fields on that record, whenever you define or redef
that record.

Simply put, I think the current behavior is not correct, and that we should
take this opportunity to determine what the behavior *should* be.

  --Vlad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20181103/248a2167/attachment.html 

From robin at corelight.com  Mon Nov  5 08:40:14 2018
From: robin at corelight.com (Robin Sommer)
Date: Mon, 5 Nov 2018 08:40:14 -0800
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <CAPqbkws7P1kfca1GDz6u9ueuCYUZ7ouQUyjpFiXarRKwih7R-Q@mail.gmail.com>
References: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>
	<201811032114.wA3LEGEm019479@fruitcake.ICSI.Berkeley.EDU>
	<CAPqbkws7P1kfca1GDz6u9ueuCYUZ7ouQUyjpFiXarRKwih7R-Q@mail.gmail.com>
Message-ID: <20181105164014.GC80620@corelight.com>


On Sat, Nov 03, 2018 at 21:58 +0000, Vlad Grigorescu wrote:

> In my mind, if the keyword is applied to a record, I would expect any new
> fields added to that record to also be logged.

I believe the reason for not doing that is that then one couldn't add
a field that's *not* being logged (because currently we don't have
remove-an-attribute support).

I like the "&log=T|F" syntax to control this more directly, as long as
"&log" remains being equivalent to "&log=T".

Generally we need to be very careful changing if we want to change any
current semantics here, as it will impact custom log files that people
create in their own scripts.

Robin

-- 
Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com

From vlad at es.net  Mon Nov  5 09:20:00 2018
From: vlad at es.net (Vlad Grigorescu)
Date: Mon, 5 Nov 2018 17:20:00 +0000
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <20181105164014.GC80620@corelight.com>
References: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>
	<201811032114.wA3LEGEm019479@fruitcake.ICSI.Berkeley.EDU>
	<CAPqbkws7P1kfca1GDz6u9ueuCYUZ7ouQUyjpFiXarRKwih7R-Q@mail.gmail.com>
	<20181105164014.GC80620@corelight.com>
Message-ID: <CAPqbkwtmxN1d8KVOB+fTaG1WKS-VVL2Pkdb2Kf64T-vbxSSmJQ@mail.gmail.com>

On Mon, Nov 5, 2018 at 4:40 PM Robin Sommer <robin at corelight.com> wrote:

>
>
> On Sat, Nov 03, 2018 at 21:58 +0000, Vlad Grigorescu wrote:
>
> > In my mind, if the keyword is applied to a record, I would expect any new
> > fields added to that record to also be logged.
>
> I believe the reason for not doing that is that then one couldn't add
> a field that's *not* being logged (because currently we don't have
> remove-an-attribute support).
>

Yeah, I think the reasoning makes sense, and that seemed to be the
consensus from the discussion on bro-dev in 2011. My point is simply that
with the current behavior, it's not clear (or, AFAICT, documented) that
adding &log to a record is just a shorthand for adding &log to each
attribute, and that it really has no meaning for the record as a whole.

  --Vlad
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20181105/61af9377/attachment.html 

From vern at corelight.com  Tue Nov  6 13:00:06 2018
From: vern at corelight.com (Vern Paxson)
Date: Tue, 06 Nov 2018 13:00:06 -0800
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>
	(Sat, 03 Nov 2018 21:00:36 -0000).
Message-ID: <201811062100.wA6L067h019736@fruitcake.ICSI.Berkeley.EDU>

Thanks a bunch for the further context & discussion.

The more I've delved into this, the more convinced I've become that we have
a basic architectural problem with attributes: they're associated with
identifiers and values, but not types ... *except* for hacky ways for
records and record fields.

My alternative implementation for type names fares a bit better with the
example you gave at http://try.bro.org/#/trybro/saved/275829 ... but still
gives counterintuitive behavior when I introduce a minor variant (I'll
spare you the details), with the problem being that a subsequent use of
&log (rather than the use when a record is declared) isn't propagated to
the record's individual fields.

I could I guess add code to do that propagation ... but testing further,
none of this fixes the original problem that I cared about, which is
to be able to declare types with &default values for tables, ala BIT-248:

	type str_tbl: table[string] of string &default="";

Here the problem is that the only opportunity to associate a &default
attribute with a table is when instantiating a table value.  It doesn't
work if str_tbl is instead used to define a record field, similar to the
lack of propagation for &log.

I think what we need to do is rethink the basic architecture/structure of
attributes.  In particular, types in general (not just named types) should
be able to have attributes associated with them.  The attributes associated
with an identifier are those associated with its type plus those directly
associated with the identifier (like &redef).

While doing this, we can also think about mechanisms for removing attributes.
I don't think the "&attr=F" approach mentioned earlier on this thread will do
the trick, since it's syntactically/semantically quite weird for attributes
that already take expressions as values, such as &default or &read_expire.

		Vern

From jsiwek at corelight.com  Wed Nov  7 08:23:29 2018
From: jsiwek at corelight.com (Jon Siwek)
Date: Wed, 7 Nov 2018 10:23:29 -0600
Subject: [Bro-Dev] attributes & named types
In-Reply-To: <201811062100.wA6L067h019736@fruitcake.ICSI.Berkeley.EDU>
References: <CAPqbkwv75qjuWx1otEg0schcUvE2C6BX_Ko6MrBDhkymZwo9yA@mail.gmail.com>
	<201811062100.wA6L067h019736@fruitcake.ICSI.Berkeley.EDU>
Message-ID: <CAMzgZ0+tewpXVX0+A4hrP+0hqcMundMSEMJm=HVCNV6aThBG3Q@mail.gmail.com>

On Tue, Nov 6, 2018 at 3:00 PM Vern Paxson <vern at corelight.com> wrote:

> I think what we need to do is rethink the basic architecture/structure of
> attributes.  In particular, types in general (not just named types) should
> be able to have attributes associated with them.  The attributes associated
> with an identifier are those associated with its type plus those directly
> associated with the identifier (like &redef).

Sounds worth pursuing.  I think this was also one of the routes
originally offered, but not sure if it actually got attempted or there
were other complications.  Hard to remember all the twists this issue
has taken, but you probably have the freshest view of things at the
moment to decide which way to try going.

> While doing this, we can also think about mechanisms for removing attributes.
> I don't think the "&attr=F" approach mentioned earlier on this thread will do
> the trick, since it's syntactically/semantically quite weird for attributes
> that already take expressions as values, such as &default or &read_expire.

Yeah, attr removal seems to warrant its own unique syntax.  But might
help to just review which attrs one may actually want to remove (or
even make sense to remove) -- seems like it's only &log in the first
place so maybe doesn't warrant a generalized mechanism ?

A related idea I haven't thought through: how about providing a BIF
that does attr removal/modification?  Actually seems more powerful to
be able to change attributes at runtime rather than just parse-time.

Another thought/worry that may or may not be valid for generalized
attr remove/modification: seems there may be opportunity to create
non-sensical states.  e.g. the sequence of (1) create a value of the
type "foo" which initially has attr &bar, (2) later remove &bar from
type foo, (3) are the existing values of type foo still coherent now
that they lack &bar ?  Obviously made up the type/attr, but probably
have to think that sequence through for each existing attribute to
make sure behavior is well-defined for each.

- Jon

From karl.pietrzak at twosixlabs.com  Thu Nov  8 14:29:42 2018
From: karl.pietrzak at twosixlabs.com (Karl Pietrzak)
Date: Thu, 8 Nov 2018 17:29:42 -0500
Subject: [Bro-Dev] best way to apply NLP to syslog entries?
Message-ID: <CAMxM4r+8kpbVKY0fX19bOQ=6a+ih4Eo9N8MyybkSnh_PzLdJKw@mail.gmail.com>

Hey everyone!

We're working on analyzing semi-structured logs (such as syslog, Windows
events, etc.), and I'm trying to figure out if Bro/Zeek is the right tool
for the job.

Bro/Zeek has great support for parsing syslog messages into its parts
<https://www.bro.org/sphinx/scripts/base/protocols/syslog/main.bro.html>[1],
but we wanna take it one step further, applying some NLP to the message
part of the syslog entry, such as named entity extraction.

What's the best way to integrate something like this?

   1. Forking the syslog script from bro/scripts/base/protocols/syslog [2],
   and using Zeek's FFI to integrate some C/C++ code?
   2. Use whatever NLP tools I prefer, and integrate the Brocolli Client
   Communications Library
   <https://www.bro.org/sphinx/components/broccoli/broccoli-manual.html> [3]
   to send events to Bro/Zeek?

Maybe there is other, better ways to do this.  Any advice on this matter
would be appreciated!

Thank you!

[1]: https://www.bro.org/sphinx/scripts/base/protocols/syslog/main.bro.html
[2]: https://github.com/bro/bro/tree/master/scripts/base/protocols/syslog
[3]: https://www.bro.org/sphinx/components/broccoli/broccoli-manual.html
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20181108/534b2615/attachment.html 

From jan.grashoefer at gmail.com  Fri Nov  9 02:56:09 2018
From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=)
Date: Fri, 9 Nov 2018 11:56:09 +0100
Subject: [Bro-Dev] best way to apply NLP to syslog entries?
In-Reply-To: <CAMxM4r+8kpbVKY0fX19bOQ=6a+ih4Eo9N8MyybkSnh_PzLdJKw@mail.gmail.com>
References: <CAMxM4r+8kpbVKY0fX19bOQ=6a+ih4Eo9N8MyybkSnh_PzLdJKw@mail.gmail.com>
Message-ID: <e9f4c875-3206-dc82-85cc-9471a0f8ff40@gmail.com>

Hi Karl,

On 08/11/2018 23:29, Karl Pietrzak wrote:
> We're working on analyzing semi-structured logs (such as syslog, Windows
> events, etc.), and I'm trying to figure out if Bro/Zeek is the right tool
> for the job.
> 
> ...
> 
> Maybe there is other, better ways to do this.  Any advice on this matter
> would be appreciated!

you might want to have a look at https://github.com/J-Gras/bro-lognorm. 
It integrates liblognorm into Bro to parse for example syslog messages. 
The only thing you need is an appropriate rulebase (so no NLP here).

Jan

From jsiwek at corelight.com  Mon Nov 12 10:27:13 2018
From: jsiwek at corelight.com (Jon Siwek)
Date: Mon, 12 Nov 2018 12:27:13 -0600
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
Message-ID: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>

Trying to broaden the scope of:

https://github.com/bro/bro/issues/208

I recently noticed there's a range of behaviors in how various
scripting mistakes are treated.  They may (1) abort, as in case of bad
subnet mask or incompatible vector element assignment (2) skip over
evaluating (sub)expression(s), but otherwise continue current function
body, as in case of non-existing table index access or (3) exit the
current function body, as in the classic case of uninitialized record
field access.

1st question: should these be made more consistent? I'd say yes.

2nd question: what is the expected way for these to be handled?  I'd
argue that (3) is close to expected behavior, but it's still weird
that it's only the *current function body* (yes, *function*, not
event) that exits early -- hard to reason about what sort of arbitrary
code was depending on that function to be fully evaluated and what
other sort of inconsistent state is caused by exiting early.

I propose, for 2.7, to aim for consistent error handling for scripting
mistakes and that the expected behavior is to unwind all the way to
exiting the current event handler (all its function bodies).  That
makes it easier to explain how to write event handlers such that they
won't enter too wild/inconsistent of a state should a scripting error
occur: "always write an event handler such that it makes no
assumptions about order/priority of other events handlers".  That's
already close to current suggestions/approaches.

One exception may be within bro_init(), if an error happens at that
time, I'd say it's fine to completely abort -- it's unlikely or hard
to say whether Bro would operate well if it proceeded after an error
that early in initialization.

Thoughts?

- Jon

From jmellander at lbl.gov  Mon Nov 12 11:44:40 2018
From: jmellander at lbl.gov (Jim Mellander)
Date: Mon, 12 Nov 2018 11:44:40 -0800
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
In-Reply-To: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>
References: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>
Message-ID: <CADju=b5PF8x=t=KbN1U91nk3dbgVxfqN4gFEacpBTreSHZcO_A@mail.gmail.com>

Along the same vein of sensible Bro script error handling, I'm resending an
issue I found in January:

I was tinkering with the sumstats code, and inadvertantly deleted the final
"}" closing out the last function.  When running the code, the misleading
error message is received:

error in bro/share/bro/base/frameworks/tunnels/./main.bro, line 8: syntax
error, at or near "module"

presumably due to the function still being open when the next policy script
is loaded.  Wouldn't it be more reasonable to check at the end of each
script when loaded that there are no dangling functions, expressions, etc.
????

==========================

There are also silent fails which probably should give a warning, such as
failing to include the fully-qualified event name silently preventing the
event from being triggered.


==========================

The above are more in the area of parsing vs runtime.

My idea on runtime scripting errors would be to apply a sensible default to
the offending expression (null or 0, as the case may be, might be
sufficient), log the error, and continue....


Jim


On Mon, Nov 12, 2018 at 10:27 AM, Jon Siwek <jsiwek at corelight.com> wrote:

> Trying to broaden the scope of:
>
> https://github.com/bro/bro/issues/208
>
> I recently noticed there's a range of behaviors in how various
> scripting mistakes are treated.  They may (1) abort, as in case of bad
> subnet mask or incompatible vector element assignment (2) skip over
> evaluating (sub)expression(s), but otherwise continue current function
> body, as in case of non-existing table index access or (3) exit the
> current function body, as in the classic case of uninitialized record
> field access.
>
> 1st question: should these be made more consistent? I'd say yes.
>
> 2nd question: what is the expected way for these to be handled?  I'd
> argue that (3) is close to expected behavior, but it's still weird
> that it's only the *current function body* (yes, *function*, not
> event) that exits early -- hard to reason about what sort of arbitrary
> code was depending on that function to be fully evaluated and what
> other sort of inconsistent state is caused by exiting early.
>
> I propose, for 2.7, to aim for consistent error handling for scripting
> mistakes and that the expected behavior is to unwind all the way to
> exiting the current event handler (all its function bodies).  That
> makes it easier to explain how to write event handlers such that they
> won't enter too wild/inconsistent of a state should a scripting error
> occur: "always write an event handler such that it makes no
> assumptions about order/priority of other events handlers".  That's
> already close to current suggestions/approaches.
>
> One exception may be within bro_init(), if an error happens at that
> time, I'd say it's fine to completely abort -- it's unlikely or hard
> to say whether Bro would operate well if it proceeded after an error
> that early in initialization.
>
> Thoughts?
>
> - Jon
> _______________________________________________
> bro-dev mailing list
> bro-dev at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20181112/8c40275d/attachment.html 

From jsiwek at corelight.com  Mon Nov 12 12:24:47 2018
From: jsiwek at corelight.com (Jon Siwek)
Date: Mon, 12 Nov 2018 14:24:47 -0600
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
In-Reply-To: <CADju=b5PF8x=t=KbN1U91nk3dbgVxfqN4gFEacpBTreSHZcO_A@mail.gmail.com>
References: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>
	<CADju=b5PF8x=t=KbN1U91nk3dbgVxfqN4gFEacpBTreSHZcO_A@mail.gmail.com>
Message-ID: <CAMzgZ0JJ5NKNHUQwvKFYThomq5+Tapss11F76G51RUNW3sd_aQ@mail.gmail.com>

On Mon, Nov 12, 2018 at 1:44 PM Jim Mellander <jmellander at lbl.gov> wrote:

> I was tinkering with the sumstats code, and inadvertantly deleted the final "}" closing out the last function.  When running the code, the misleading error message is received:

Yes, that's a bit of a different topic, but still tracked (at
low-normal priority):

https://github.com/bro/bro/issues/167

> There are also silent fails which probably should give a warning, such as failing to include the fully-qualified event name silently preventing the event from being triggered.

Also a bit different that what I was talking about, but also tracked
(at higher priority since it's a common mistake):

https://github.com/bro/bro/issues/163

> My idea on runtime scripting errors would be to apply a sensible default to the offending expression (null or 0, as the case may be, might be sufficient), log the error, and continue....

In the following example (comments reflect current behavior) you'd
expect the "false" branch in foo() to be taken?

#################################
function foo()
    {
    local t: table[string] of string = table();

    # Non-existing index access: (sub)expressions are not evaluated
    if ( t["nope"] == "nope" )
        # Unreachable
        print "yes";
    else
        # Unreachable
        print "no";

    # Reachable
    print "foo done";
    }

event bro_init()
    {
    foo();
    # Reachable
    print "bro_init done";
    }
#################################

My thought was that should behave more like a "key error" run-time
exception (e.g. like Python).  Bro scripting doesn't have exception
support, but internally we can use an exception to unwind the call
stack (additionally I was thinking that the unwind needs to proceed
further than what it does already in some cases, which is just the
current function body).  In any case, logging of the error would also
occur (as it already does).

- Jon

From robin at corelight.com  Mon Nov 12 20:39:49 2018
From: robin at corelight.com (Robin Sommer)
Date: Mon, 12 Nov 2018 20:39:49 -0800
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
In-Reply-To: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>
References: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>
Message-ID: <20181113043949.GH45250@corelight.com>


On Mon, Nov 12, 2018 at 12:27 -0600, Jonathan Siwek wrote:

> I recently noticed there's a range of behaviors in how various
> scripting mistakes are treated.

There's a 4th: InterpreterException.

> 1st question: should these be made more consistent? I'd say yes.

Yes, definitely.

> that it's only the *current function body* (yes, *function*, not
> event) that exits early -- hard to reason about what sort of arbitrary
> code was depending on that function to be fully evaluated and what
> other sort of inconsistent state is caused by exiting early.

... and what happens if the function is supposed to return a value,
but now doesn't?

> I propose, for 2.7, to aim for consistent error handling for scripting
> mistakes and that the expected behavior is to unwind all the way to
> exiting the current event handler (all its function bodies).

Agree with that. Can we do that cleanly though? The problem with
InterpreterException is that it may leak memory. We'd need to do the
unwinding manually throughout the interpreter code, but that means
touching a number of places to pass the error information back.

> One exception may be within bro_init(), if an error happens at that
> time, I'd say it's fine to completely abort -- it's unlikely or hard
> to say whether Bro would operate well if it proceeded after an error
> that early in initialization.

Yeah, that makes sense.

Robin

-- 
Robin Sommer * Corelight, Inc. * robin at corelight.com * www.corelight.com

From jsiwek at corelight.com  Tue Nov 13 08:23:06 2018
From: jsiwek at corelight.com (Jon Siwek)
Date: Tue, 13 Nov 2018 10:23:06 -0600
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
In-Reply-To: <20181113043949.GH45250@corelight.com>
References: <CAMzgZ0LK7S2JG3a-2psNoGGZ=aRNPfgWHGUQz=XqsOTwpSm6og@mail.gmail.com>
	<20181113043949.GH45250@corelight.com>
Message-ID: <CAMzgZ0KhkbZ6A3f1TWbKb8x5fW101V6ib3NySENzn-k29LhrTg@mail.gmail.com>

On Mon, Nov 12, 2018 at 10:39 PM Robin Sommer <robin at corelight.com> wrote:

> > that it's only the *current function body* (yes, *function*, not
> > event) that exits early -- hard to reason about what sort of arbitrary
> > code was depending on that function to be fully evaluated and what
> > other sort of inconsistent state is caused by exiting early.
>
> ... and what happens if the function is supposed to return a value,
> but now doesn't?

Accesses to it emit a "value used but not set" error, but subsequent
statements within same function/event still get evaluated (unless they
are themselves now incoherent and trigger various cascading error
behaviors).

> > I propose, for 2.7, to aim for consistent error handling for scripting
> > mistakes and that the expected behavior is to unwind all the way to
> > exiting the current event handler (all its function bodies).
>
> Agree with that. Can we do that cleanly though? The problem with
> InterpreterException is that it may leak memory. We'd need to do the
> unwinding manually throughout the interpreter code, but that means
> touching a number of places to pass the error information back.

Should be possible, just a question of effort/difficulty.  An
alternative to manually passing error information back via return
values is migrating from explicit reference counting to shared_ptr.
Either approach requires touching similar code locations, but also the
later may be easier to proceed with after the old serialization system
gets removed from the BroObj class hierarchy.

Since we already have a class of errors that may induce leaks, we
could still move forward with applying consistent error handling
behavior via InterpreterException, but then later expect to resolve
the leakage issue independently via implementation detail
improvements.

The final resolution is still for people to correct underlying
scripting mistakes, it's just that having more consistent and improved
error handling makes it easier to reason about the subsequent
operational state of Bro with more confidence.

- Jon

From vern at corelight.com  Tue Nov 13 14:32:30 2018
From: vern at corelight.com (Vern Paxson)
Date: Tue, 13 Nov 2018 14:32:30 -0800
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
In-Reply-To: <CAMzgZ0KhkbZ6A3f1TWbKb8x5fW101V6ib3NySENzn-k29LhrTg@mail.gmail.com>
	(Tue, 13 Nov 2018 10:23:06 CST).
Message-ID: <201811132232.wADMWUrP012613@fruitcake.ICSI.Berkeley.EDU>

I like what you & Robin sketch.  FWIW, it's hard for me to get excited
over the issue of leaks-in-the-face-of-error-recovery.  Presumably it would
take in practice a lot of error recovery before this actually hoses the
execution due to running out of memory.  At that point, it's not unreasonable
for things to keel over anyway.

An alternative/additional approach would be to introduce a notion of
"failure" as a first-class object.  In another life I used a language that
did that, and it worked remarkably well.  But clearly that's a bigger
undertaking than the valuable near-term notion of regularizing how Zeek
deals with errors.

		Vern

From jsiwek at corelight.com  Wed Nov 14 13:34:15 2018
From: jsiwek at corelight.com (Jon Siwek)
Date: Wed, 14 Nov 2018 15:34:15 -0600
Subject: [Bro-Dev] Consistent error handling for scripting mistakes
In-Reply-To: <201811132232.wADMWUrP012613@fruitcake.ICSI.Berkeley.EDU>
References: <CAMzgZ0KhkbZ6A3f1TWbKb8x5fW101V6ib3NySENzn-k29LhrTg@mail.gmail.com>
	<201811132232.wADMWUrP012613@fruitcake.ICSI.Berkeley.EDU>
Message-ID: <CAMzgZ0JsSy8ZZTzZZnCREQyx0aCwcA_Q+KCAubEbaC_qToxUvQ@mail.gmail.com>

On Tue, Nov 13, 2018 at 4:32 PM Vern Paxson <vern at corelight.com> wrote:
>
> I like what you & Robin sketch.  FWIW, it's hard for me to get excited
> over the issue of leaks-in-the-face-of-error-recovery.

Yeah, it's not great, but intention would still be to eventually fix
the leakage problem, too.

Anyway, made a new GH issue to track the broader error handling enhancement:

https://github.com/bro/bro/issues/211

- Jon

From oldpopsong at qq.com  Thu Nov 29 01:08:08 2018
From: oldpopsong at qq.com (=?ISO-8859-1?B?U29uZw==?=)
Date: Thu, 29 Nov 2018 17:08:08 +0800
Subject: [Bro-Dev] BinPac: is there a way to get the length of decoded field?
Message-ID: <tencent_47892BE1694FD7326D68B04D@qq.com>

Hi,

I'm trying to write an analyzer for a protocol which uses Google Protocol Buffers for serialization. The request message MyProto_Req is like:

    <4 bytes indicating the length of the rest of the message>
    <Protobuf varint indicating the length of the REQUEST_HEADER>
    <REQUEST_HEADER data>
    <Protobuf varint indicating the length of the REQUEST_PARAMETER>
    <REQUEST_PARAMETER data>
    <optional data>

( You can find the Protobuf varint encoding here: https://developers.google.com/protocol-buffers/docs/encoding#varints )

Obviously the length of <optional data> must be calculated using previous length fields.

Below is my code:

type PBVarint = record {
        val_bytes      : uint8[] &until($element < 0x80);
} &let {
        val                : uint64 = varint_to_int64(val_bytes);
        my_len          : uint8 = varint_len(val_bytes);    # the length of this varint
};

function varint_to_int64(val_bytes: uint8[]) : uint64
        %{
        uint64 v = 0;

        for ( unsigned int i = 0; i < val_bytes->size(); ++i )
                {
                uint64 byte = ((*val_bytes)[i] & 0x7f);
                v |= byte << (8 * i);
                }

        return v;
        %}

function varint_len(val_bytes: uint8[]) : uint8
        %{
        return val_bytes->size();
        %}

type MyProto_Req = record {
        length              : uint32;
        len_reqHeader   : PBVarint;
        reqHeader         : bytestring &length = len_reqHeader.val;
        len_reqPara       : PBVarint;
        reqPara             : bytestring &length = len_reqPara.val;
        optionalData      : bytestring &length = (length - len_reqHeader.val - len_reqHeader.my_len - len_reqPara.val - len_reqPara.my_len);
};

It works. But I wonder if there is a better way to calcuate the length of optionalData (to kill the function varint_len()). I've tried:
        optionalData    : bytestring &length = (length - len_reqHeader.val - lenHeader.val_bytes->size() - len_reqPara.val - len_reqPara.val_bytes->size())
but failed.

Any hints?