From jdopheid at illinois.edu  Mon Oct  2 07:13:33 2017
From: jdopheid at illinois.edu (Dopheide, Jeannette M)
Date: Mon, 2 Oct 2017 14:13:33 +0000
Subject: [Bro-Dev] Renaming the Bro Project: seeking proposed names from the
	community
Message-ID: <AB23B0AA-BF20-42B5-8702-1352267BFF67@illinois.edu>

This year at BroCon we announced that the Bro Project will be changing its name. While ?Bro? was originally meant as an Orwellian reminder of the risk that any monitoring fundamentally entails, it has more recently gained a very different, and quite offensive, reputation (?Bro culture?). To avoid facing instant negative impressions with new users that aren?t aware of the history, the Leadership Team has decided to seek a name change. 

We are accepting proposed names from the community for two months (due Monday December 4th). The Leadership Team will review the list of possible names and narrow it down to 5 finalists. We will announce the finalists and take a second round of feedback from the community before making the final selection. We hope to announce the new name within the next major release.

To submit a proposed name, fill out the form here: 

https://goo.gl/forms/qwR8s6Yd4H0Bu8Ca2

------
Jeannette Dopheide
Sr. Education, Outreach, and Training Coordinator
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
 

From scampbell at lbl.gov  Mon Oct  2 13:04:01 2017
From: scampbell at lbl.gov (Scott Campbell)
Date: Mon, 2 Oct 2017 13:04:01 -0700
Subject: [Bro-Dev] Renaming the Bro Project: seeking proposed names from
 the community
In-Reply-To: <AB23B0AA-BF20-42B5-8702-1352267BFF67@illinois.edu>
References: <AB23B0AA-BF20-42B5-8702-1352267BFF67@illinois.edu>
Message-ID: <2ce29438-52bc-390e-e628-96fbf98a9e3d@lbl.gov>


I wanted to express my unconditional support for this.

The culture of non-inclusive hostile behavior embodied by "bro culture" 
stands opposite to the remarkable environment that has surrounded this 
project for as long as I have been a member.  It is a shame that the 
state of R&D is so toxic that this change has to be discussed, but as a 
shareholder in the community I see it as our responsibility to take 
/active/ measures to make sure everyone feels welcome and to serve as an 
example of what a project can and should be.

Also for the record I have always loved the original name with the 
Orwellian reminder that what we do requires constant responsibility and 
thoughtfulness.

thank you,
scott

On 10/2/17 7:13 AM, Dopheide, Jeannette M wrote:
> This year at BroCon we announced that the Bro Project will be changing its name. While ?Bro? was originally meant as an Orwellian reminder of the risk that any monitoring fundamentally entails, it has more recently gained a very different, and quite offensive, reputation (?Bro culture?). To avoid facing instant negative impressions with new users that aren?t aware of the history, the Leadership Team has decided to seek a name change.
> 
> We are accepting proposed names from the community for two months (due Monday December 4th). The Leadership Team will review the list of possible names and narrow it down to 5 finalists. We will announce the finalists and take a second round of feedback from the community before making the final selection. We hope to announce the new name within the next major release.
> 
> To submit a proposed name, fill out the form here:
> 
> https://goo.gl/forms/qwR8s6Yd4H0Bu8Ca2
> 
> ------
> Jeannette Dopheide
> Sr. Education, Outreach, and Training Coordinator
> National Center for Supercomputing Applications
> University of Illinois at Urbana-Champaign
>   
> 
> 
> 
> 
> 
> _______________________________________________
> bro-dev mailing list
> bro-dev at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
> 

From slagell at illinois.edu  Wed Oct  4 11:57:00 2017
From: slagell at illinois.edu (Slagell, Adam J)
Date: Wed, 4 Oct 2017 18:57:00 +0000
Subject: [Bro-Dev] Bro working well on Mac OS High Sierra,
	just a couple test failures
Message-ID: <8097F517-687F-4E2E-9272-732853185F64@illinois.edu>

I had no problems after the upgrade to High Sierra on my ?production? box, and I had no troubles compiling Bro 2.5.1 on my laptop.

I did, however, get a two errors in the test suite.

core.truncation ... failed
  % 'btest-diff output' failed unexpectedly (exit code 1)
  % cat .diag
  == File ===============================
  #separator \x09
  #set_separator	,
  #empty_field	(empty)
  #unset_field	-
  #path	weird
  #open	2017-10-04-18-48-40
  #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
  #types	time	string	addr	port	addr	port	string	string	bool	string
  1334160095.895421	-	-	-	-	-	truncated_IP	bro
  #close	2017-10-04-18-48-40
  #separator \x09
  #set_separator	,
  #empty_field	(empty)
  #unset_field	-
  #path	weird
  #open	2017-10-04-18-48-41
  #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
  #types	time	string	addr	port	addr	port	string	string	bool	string
  1334156241.519125	-	-	-	-	-	truncated_IP	bro
  #close	2017-10-04-18-48-41
  #separator \x09
  #set_separator	,
  #empty_field	(empty)
  #unset_field	-
  #path	weird
  #open	2017-10-04-18-48-41
  #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
  #types	time	string	addr	port	addr	port	string	string	bool	string
  1334094648.590126	-	-	-	-	-	truncated_IP	bro
  #close	2017-10-04-18-48-41
  #separator \x09
  #set_separator	,
  #empty_field	(empty)
  #unset_field	-
  #path	weird
  #open	2017-10-04-18-48-43
  #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
  #types	time	string	addr	port	addr	port	string	string	bool	string
  1338328954.078361	-	-	-	-	-	internally_truncated_header	-	F	bro
  #close	2017-10-04-18-48-43
  #separator \x09
  #set_separator	,
  #empty_field	(empty)
  #unset_field	-
  #path	weird
  #open	2017-10-04-18-48-43
  #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
  #types	time	string	addr	port	addr	port	string	string	bool	string
  1404148886.981015	-	-	-	-	-	bad_IP_checksumbro
  1404148887.011158	CHhAvVGS1DHFjwGM9	192.168.4.149	51293	72.21.91.29	443	bad_TCP_checksum	-	F	bro
  #close	2017-10-04-18-48-43
  == Diff ===============================
  --- /tmp/test-diff.62112.output.baseline.tmp	2017-10-04 18:48:43.000000000 +0000
  +++ /tmp/test-diff.62112.output.tmp	2017-10-04 18:48:43.000000000 +0000
  @@ -46,5 +46,6 @@
   #open XXXX-XX-XX-XX-XX-XX
   #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
   #types	time	string	addr	port	addr	port	string	string	bool	string
  -0.000000	-	-	-	-	-	truncated_link_header	bro
  +XXXXXXXXXX.XXXXXX	-	-	-	-	-	bad_IP_checksumbro
  +XXXXXXXXXX.XXXXXX	CHhAvVGS1DHFjwGM9	192.168.4.149	51293	72.21.91.29	443	bad_TCP_checksum	-	F	bro
   #close XXXX-XX-XX-XX-XX-XX
  =======================================

  % cat .stderr
  1404148887.011158 warning in /Users/slagell/Downloads/bro-2.5.1/scripts/base/misc/find-checksum-offloading.bro, line 54: Your trace file likely has invalid IP and TCP checksums, most likely from NIC checksum offloading.  By default, packets with invalid checksums are discarded by Bro unless using the -C command-line option or toggling the 'ignore_checksums' variable.  Alternatively, disable checksum offloading by the network adapter to ensure Bro analyzes the actual checksums that are transmitted.
  1404148887.011158 warning in /Users/slagell/Downloads/bro-2.5.1/scripts/base/misc/find-filtered-trace.bro, line 48: The analyzed trace file was determined to contain only TCP control packets, which may indicate it's been pre-filtered.  By default, Bro reports the missing segments for this type of trace, but the 'detect_filtered_trace' option may be toggled if that's not desired.

istate.bro-ipv6-socket ... failed
  % 'btest-bg-wait 20' failed unexpectedly (exit code 1)
  % cat .stderr
  The following processes did not terminate:
  
  bro -b ../recv.bro
  bro -b ../send.bro
  
  -----------
  <<< [72978] bro -b ../recv.bro
  received termination signal
  >>>
  <<< [72998] bro -b ../send.bro
  received termination signal
  >>>

------

Adam J. Slagell
Director, Cybersecurity & Networking Division
Chief Information Security Officer
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
www.slagell.info

"Under the Illinois Freedom of Information Act (FOIA), any written communication to or from University employees regarding University business is a public record and may be subject to public disclosure." 


From dnthayer at illinois.edu  Wed Oct  4 12:14:38 2017
From: dnthayer at illinois.edu (Daniel Thayer)
Date: Wed, 4 Oct 2017 14:14:38 -0500
Subject: [Bro-Dev] Bro working well on Mac OS High Sierra,
 just a couple test failures
In-Reply-To: <8097F517-687F-4E2E-9272-732853185F64@illinois.edu>
References: <8097F517-687F-4E2E-9272-732853185F64@illinois.edu>
Message-ID: <aa7f368b-1358-2a27-5e1f-1c873362b828@illinois.edu>

The first test failure was fixed after the release of 2.5.1.  The second
failure looks like another race condition (try again a few times and it
will likely pass).


On 10/4/17 1:57 PM, Slagell, Adam J wrote:
> I had no problems after the upgrade to High Sierra on my ?production? box, and I had no troubles compiling Bro 2.5.1 on my laptop.
> 
> I did, however, get a two errors in the test suite.
> 
> core.truncation ... failed
>    % 'btest-diff output' failed unexpectedly (exit code 1)
>    % cat .diag
>    == File ===============================
>    #separator \x09
>    #set_separator	,
>    #empty_field	(empty)
>    #unset_field	-
>    #path	weird
>    #open	2017-10-04-18-48-40
>    #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
>    #types	time	string	addr	port	addr	port	string	string	bool	string
>    1334160095.895421	-	-	-	-	-	truncated_IP	bro
>    #close	2017-10-04-18-48-40
>    #separator \x09
>    #set_separator	,
>    #empty_field	(empty)
>    #unset_field	-
>    #path	weird
>    #open	2017-10-04-18-48-41
>    #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
>    #types	time	string	addr	port	addr	port	string	string	bool	string
>    1334156241.519125	-	-	-	-	-	truncated_IP	bro
>    #close	2017-10-04-18-48-41
>    #separator \x09
>    #set_separator	,
>    #empty_field	(empty)
>    #unset_field	-
>    #path	weird
>    #open	2017-10-04-18-48-41
>    #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
>    #types	time	string	addr	port	addr	port	string	string	bool	string
>    1334094648.590126	-	-	-	-	-	truncated_IP	bro
>    #close	2017-10-04-18-48-41
>    #separator \x09
>    #set_separator	,
>    #empty_field	(empty)
>    #unset_field	-
>    #path	weird
>    #open	2017-10-04-18-48-43
>    #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
>    #types	time	string	addr	port	addr	port	string	string	bool	string
>    1338328954.078361	-	-	-	-	-	internally_truncated_header	-	F	bro
>    #close	2017-10-04-18-48-43
>    #separator \x09
>    #set_separator	,
>    #empty_field	(empty)
>    #unset_field	-
>    #path	weird
>    #open	2017-10-04-18-48-43
>    #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
>    #types	time	string	addr	port	addr	port	string	string	bool	string
>    1404148886.981015	-	-	-	-	-	bad_IP_checksumbro
>    1404148887.011158	CHhAvVGS1DHFjwGM9	192.168.4.149	51293	72.21.91.29	443	bad_TCP_checksum	-	F	bro
>    #close	2017-10-04-18-48-43
>    == Diff ===============================
>    --- /tmp/test-diff.62112.output.baseline.tmp	2017-10-04 18:48:43.000000000 +0000
>    +++ /tmp/test-diff.62112.output.tmp	2017-10-04 18:48:43.000000000 +0000
>    @@ -46,5 +46,6 @@
>     #open XXXX-XX-XX-XX-XX-XX
>     #fields	ts	uid	id.orig_h	id.orig_p	id.resp_h	id.resp_p	name	addl	notice	peer
>     #types	time	string	addr	port	addr	port	string	string	bool	string
>    -0.000000	-	-	-	-	-	truncated_link_header	bro
>    +XXXXXXXXXX.XXXXXX	-	-	-	-	-	bad_IP_checksumbro
>    +XXXXXXXXXX.XXXXXX	CHhAvVGS1DHFjwGM9	192.168.4.149	51293	72.21.91.29	443	bad_TCP_checksum	-	F	bro
>     #close XXXX-XX-XX-XX-XX-XX
>    =======================================
> 
>    % cat .stderr
>    1404148887.011158 warning in /Users/slagell/Downloads/bro-2.5.1/scripts/base/misc/find-checksum-offloading.bro, line 54: Your trace file likely has invalid IP and TCP checksums, most likely from NIC checksum offloading.  By default, packets with invalid checksums are discarded by Bro unless using the -C command-line option or toggling the 'ignore_checksums' variable.  Alternatively, disable checksum offloading by the network adapter to ensure Bro analyzes the actual checksums that are transmitted.
>    1404148887.011158 warning in /Users/slagell/Downloads/bro-2.5.1/scripts/base/misc/find-filtered-trace.bro, line 48: The analyzed trace file was determined to contain only TCP control packets, which may indicate it's been pre-filtered.  By default, Bro reports the missing segments for this type of trace, but the 'detect_filtered_trace' option may be toggled if that's not desired.
> 
> istate.bro-ipv6-socket ... failed
>    % 'btest-bg-wait 20' failed unexpectedly (exit code 1)
>    % cat .stderr
>    The following processes did not terminate:
>    
>    bro -b ../recv.bro
>    bro -b ../send.bro
>    
>    -----------
>    <<< [72978] bro -b ../recv.bro
>    received termination signal
>    >>>
>    <<< [72998] bro -b ../send.bro
>    received termination signal
>    >>>
> 
> ------
> 
> Adam J. Slagell
> Director, Cybersecurity & Networking Division
> Chief Information Security Officer
> National Center for Supercomputing Applications
> University of Illinois at Urbana-Champaign
> www.slagell.info
> 
> "Under the Illinois Freedom of Information Act (FOIA), any written communication to or from University employees regarding University business is a public record and may be subject to public disclosure."
> 
> 
> 
> 
> 
> 
> 
> 
> 
> _______________________________________________
> bro-dev mailing list
> bro-dev at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev
> 

From slagell at illinois.edu  Wed Oct  4 13:16:55 2017
From: slagell at illinois.edu (Slagell, Adam J)
Date: Wed, 4 Oct 2017 20:16:55 +0000
Subject: [Bro-Dev] Bro working well on Mac OS High Sierra,
 just a couple test failures
In-Reply-To: <aa7f368b-1358-2a27-5e1f-1c873362b828@illinois.edu>
References: <8097F517-687F-4E2E-9272-732853185F64@illinois.edu>
	<aa7f368b-1358-2a27-5e1f-1c873362b828@illinois.edu>
Message-ID: <1D72C4F8-F8DB-4B15-A114-D063435CEDFC@illinois.edu>


On Oct 4, 2017, at 2:14 PM, Thayer, Daniel N <dnthayer at illinois.edu<mailto:dnthayer at illinois.edu>> wrote:

The second
failure looks like another race condition (try again a few times and it
will likely pass).

Right you are. 4th time?s a charm. :-)

------

Adam J. Slagell
Director, Cybersecurity & Networking Division
Chief Information Security Officer
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
www.slagell.info<http://www.slagell.info>

"Under the Illinois Freedom of Information Act (FOIA), any written communication to or from University employees regarding University business is a public record and may be subject to public disclosure."


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171004/abcb4028/attachment.html 

From jmellander at lbl.gov  Thu Oct  5 12:45:21 2017
From: jmellander at lbl.gov (Jim Mellander)
Date: Thu, 5 Oct 2017 12:45:21 -0700
Subject: [Bro-Dev] Performance Enhancements
Message-ID: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>

Hi all:

One item of particular interest to me from Brocon was this tidbit from
Packetsled's lightning talk:

"Optimizing core loops (like net_run() ) with preprocessor branch
prediction macros likely() and unlikely() for ~3% speedup. We optimize for
maximum load."


After conversing with Leo Linsky of Packetsled, I wanted to initiate a
conversation about easy performance improvements that may be within fairly
easy reach:


1. Obviously, branch prediction, as mentioned above.  3% speedup for
(almost) free is nothing to sneeze at.

2. Profiling bro to identify other hot spots that could benefit from
optimization.

3. Best practices for compiling Bro (compiler options, etc.)

4. Data structure revisit (hash functions, perhaps?)

etc.


Perhaps the Bro core team is working on some, all, or a lot more in this
area.  It might be nice to get the Bro community involved too.  Is anyone
else interested?


Jim Mellander

ESNet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171005/69d52d3b/attachment.html 

From slagell at illinois.edu  Thu Oct  5 13:49:15 2017
From: slagell at illinois.edu (Slagell, Adam J)
Date: Thu, 5 Oct 2017 20:49:15 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
Message-ID: <81FC33C1-55B7-4BCA-B2DB-0A222ED7C956@illinois.edu>


On Oct 5, 2017, at 2:45 PM, Jim Mellander <jmellander at lbl.gov<mailto:jmellander at lbl.gov>> wrote:

1. Obviously, branch prediction, as mentioned above.  3% speedup for (almost) free is nothing to sneeze at.
2. Profiling bro to identify other hot spots that could benefit from optimization.
3. Best practices for compiling Bro (compiler options, etc.)
4. Data structure revisit (hash functions, perhaps?)


Jon Siwek was optimizing the main event loop last February, but I believe it could only go so far without the new Broker API being integrated. Also, I believe there is a need to move off of the select() function. Anyway, there is definitely a lot of optimization that could be made there.

------

Adam J. Slagell
Director, Cybersecurity & Networking Division
Chief Information Security Officer
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
www.slagell.info<http://www.slagell.info>

"Under the Illinois Freedom of Information Act (FOIA), any written communication to or from University employees regarding University business is a public record and may be subject to public disclosure."


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171005/9c33bd2c/attachment.html 

From gc355804 at ohio.edu  Thu Oct  5 21:10:35 2017
From: gc355804 at ohio.edu (Clark, Gilbert)
Date: Fri, 6 Oct 2017 04:10:35 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
Message-ID: <CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>

Howdy:


Not sure about the content of the BroCon talk ... but a few years back, I did a bit of work on this.  There was a plugin here:


https://github.com/cubic1271/bro-plugin-instrumentation


that allowed me to profile the execution of various bro scripts and figure out what was eating the most time.  It also added a pretty braindead mechanism to expose script variables through a REST interface, which I wrapped in an HTML5 UI to get some real-time statistics ... though I have no idea where that code went.


I also threw together this:


https://github.com/cubic1271/pybrig


which was intended to benchmark Bro on a specific platform, the idea being to get results that were relatively consistent.  It could make some pretty pictures, which was kind of neat ... but I'd probably do things a lot differently if I had it to do over again :)

I'll note that one of the challenges with profiling is that there are the bro scripts, and then there is the bro engine.  The scripting layer has a completely different set of optimizations that might make sense than the engine does: turning off / turning on / tweaking different scripts can have a huge impact on Bro's relative performance depending on the frequency with which those script fragments are executed.  Thus, one way to look at speeding things up might be to take a look at the scripts that are run most often and seeing about ways to accelerate core pieces of them ... possibly by moving pieces of those scripts to builtins (as C methods).


If I had to guess at one engine-related thing that would've sped things up when I was profiling this stuff back in the day, it'd probably be rebuilding the memory allocation strategy / management.  From what I remember, Bro does do some malloc / free in the data path, which hurts quite a bit when one is trying to make things go fast.  It also means that the selection of a memory allocator and NUMA / per-node memory management is going to be important.  That's probably not going to qualify as something *small*, though ...


On a related note, a fun experiment is always to try running bro with a different allocator and seeing what happens ...


Another thing that (I found) got me a few percentage points for more-or-less free was profile-guided optimization: I ran bro first with profiling enabled against a representative data set, then rebuilt it against the profile I collected.  Of course, your mileage may vary ...


Anyway, hope something in there was useful.


Cheers,

Gilbert Clark


________________________________
From: bro-dev-bounces at bro.org <bro-dev-bounces at bro.org> on behalf of Jim Mellander <jmellander at lbl.gov>
Sent: Thursday, October 5, 2017 3:45:21 PM
To: bro-dev at bro.org
Subject: [Bro-Dev] Performance Enhancements

Hi all:

One item of particular interest to me from Brocon was this tidbit from Packetsled's lightning talk:

"Optimizing core loops (like net_run() ) with preprocessor branch prediction macros likely() and unlikely() for ~3% speedup. We optimize for maximum load."

After conversing with Leo Linsky of Packetsled, I wanted to initiate a conversation about easy performance improvements that may be within fairly easy reach:

1. Obviously, branch prediction, as mentioned above.  3% speedup for (almost) free is nothing to sneeze at.
2. Profiling bro to identify other hot spots that could benefit from optimization.
3. Best practices for compiling Bro (compiler options, etc.)
4. Data structure revisit (hash functions, perhaps?)
etc.

Perhaps the Bro core team is working on some, all, or a lot more in this area.  It might be nice to get the Bro community involved too.  Is anyone else interested?

Jim Mellander
ESNet


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171006/a47f9e3a/attachment-0001.html 

From jazoff at illinois.edu  Fri Oct  6 07:26:30 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Fri, 6 Oct 2017 14:26:30 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
Message-ID: <02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>


> On Oct 6, 2017, at 12:10 AM, Clark, Gilbert <gc355804 at ohio.edu> wrote:
> 
> I'll note that one of the challenges with profiling is that there are the bro scripts, and then there is the bro engine.  The scripting layer has a completely different set of optimizations that might make sense than the engine does: turning off / turning on / tweaking different scripts can have a huge impact on Bro's relative performance depending on the frequency with which those script fragments are executed.  Thus, one way to look at speeding things up might be to take a look at the scripts that are run most often and seeing about ways to accelerate core pieces of them ... possibly by moving pieces of those scripts to builtins (as C methods).
> 

Re: scripts, I have some code I put together to do arbitrary benchmarks of templated bro scripts.  I need to clean it up and publish it, but I found some interesting things.  Function calls are relatively slow.. so things like

    ip in Site::local_nets

Is faster than calling

    Site::is_local_addr(ip);

inlining short functions could speed things up a bit.

I also found that things like

    port == 22/tcp || port == 3389/tcp

Is faster than checking if port in {22/tcp,3389/tcp}.. up to about 10 ports.. Having the hash class fallback to a linear search when the hash only contains few items could speed things up there.  Things like 'likely_server_ports' have 1 or 2 ports in most cases.


> If I had to guess at one engine-related thing that would've sped things up when I was profiling this stuff back in the day, it'd probably be rebuilding the memory allocation strategy / management.  From what I remember, Bro does do some malloc / free in the data path, which hurts quite a bit when one is trying to make things go fast.  It also means that the selection of a memory allocator and NUMA / per-node memory management is going to be important.  That's probably not going to qualify as something *small*, though ...

Ah!  This reminds me of something I was thinking about a few weeks ago.  I'm not sure to what extent bro uses memory allocation pools/interning for common immutable data structures.  Like for port objects or small strings.  There's no reason bro should be mallocing/freeing memory to create port objects when they are only 65536 times 2 (or 3?) port objects... but bro does things like

        tcp_hdr->Assign(0, new PortVal(ntohs(tp->th_sport), TRANSPORT_TCP));
        tcp_hdr->Assign(1, new PortVal(ntohs(tp->th_dport), TRANSPORT_TCP));

For every packet.  As well as allocating a ton of TYPE_COUNT vals for things like packet sizes and header lengths.. which will almost always be between 0 and 64k.

For things that can't be interned, like ipv6 address, having an allocation pool could speed things up... Instead of freeing things like IPAddr objects they could just be returned to a pool, and then when a new IPAddr object is needed, an already initialized object could be grabbed from the pool and 'refreshed' with the new value.

https://golang.org/pkg/sync/#Pool

Talks about that sort of thing.

> On a related note, a fun experiment is always to try running bro with a different allocator and seeing what happens ...

I recently noticed our boxes were using jemalloc instead of tcmalloc.. Switching that caused malloc to drop a few places down in 'perf top' output.


? 
Justin Azoff


From jsiwek at illinois.edu  Fri Oct  6 09:53:26 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Fri, 6 Oct 2017 16:53:26 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
Message-ID: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>

I want to check if there?s any feedback on the approach I?m planning to take when porting over Bro?s scripts to use Broker.  There?s two major areas to consider: (1) how users specify network topology e.g. either for traditional cluster configuration or manually connecting Bro instances and (2) replacing &synchronized with Broker?s distributed data storage features.


Broker-Based Topology
=====================

It?s again useful to decompose topology specification it into two main use-cases:

Creating Clusters, e.g w/ BroControl
------------------------------------

This use-case should look familiar once ported to use Broker: the existing ?cluster? framework will be used for specifying the topology of the cluster and for automatically setting up the connections between nodes.  The one thing that will differ is the event subscription mechanism, which needs to change since Broker itself handles that differently, but I think the general idea can remain similar.

The current mechanism for handling event subscription/publication:

	const Cluster::manager2worker_events = /Drop::.*/ &redef;
	# similar patterns follow for all node combinations...

And a script author that is making their script usable on clusters writes:

	redef Cluster::manager2worker_events += /^Intel::(cluster_new_item|purge_item)$/;

The new mechanism:

	# contains topic prefixes
	const Cluster::manager_subscriptions: set[string] &redef;

	# contains (topic string, event name) pairs
	const Cluster::manager_publications: set[string, string] &redef;

	# similar sets follow for all node types?

And a script author writes:

	# topic naming convention relates to file hierarchy/organization of scripts
	redef Cluster::manager_subscriptions += {
		"bro/event/framework/control",
		"bro/event/framework/intel",
	};

	# not sure how to get around referencing events via strings: can't use 'any'
	# to stick event values directly into the set, maybe that?s ok since we can
	# at least detect lookup failures at bro_init time and emit errors/abort.
	redef Cluster::manager_publications += {
		["bro/event/framework/control/configuration_update_request",
		 "Control::configuration_update_request"],
		["bro/event/framework/intel/cluster_new_item",
		 "Intel::cluster_new_item"],
	};

Then subscriptions and auto-publications still get automatically set up by the cluster framework in bro_init().

Other Manual/Custom Topologies
------------------------------

I don?t see anything to do here as the Broker API already has enough to set up peerings and subscriptions in arbitrary ways.  The old ?communication? framework scripts can just go away as most of its functions have direct corollaries in the new ?broker? framework.

The one thing that is missing is the ?Communication::nodes? table which acts as both a state-tracking structure and an API that users may use to have the comm. framework automatically set up connections between the nodes in the table.  I find this redundant ? there?s two APIs to accomplish the same thing, with the table being an additional layer of indirection to the actual connect/listen functions a user can just as easily use themselves.  I also think it?s not useful for state-tracking as a user operating at the level of this use-case is can easily track nodes themselves or has some other notion of the state structures they need to track that is more intuitive for the particular problem they're solving.  Unless there?s arguments or I find it?s actually needed, I don?t plan to port this to Broker.


Broker-Based Data Distribution
==============================

Replacing &synchronized requires completely new APIs that script authors can easily use to work for both cluster and non-cluster use-cases and independently of a user?s choice of persistent storage backend.

Broker Framework API
--------------------

const Broker::default_master_node = "manager" &redef;

const Broker::default_backend = MEMORY &redef;

# Setting a default dir will, for persistent backends that have not
# been given an explicit file path, automatically create a path within this
# dir that is based on the name of the data store.
const Broker::default_store_dir = "" &redef;

type Broker::StoreInfo: record {
  name: string &optional;
  store: opaque of Broker::Store &optional;
  master_node: string &default=Broker::default_master_node;
  master: bool &default=F;
  backend: Broker::BackendType &default=default_backend;
  options: Broker::BackendOptions &default=Broker::BackendOptions();
};

# Primarily used by users to set up master store location and backend
# configuration, but also possible to lookup an existing/open store by name.
global Broker::stores: table[string] of StoreInfo &default=StoreInfo() &redef;

# Set up data stores to properly function regardless of whether user is
# operating a cluster.  This also automatically sets up the store to
# be a clone or a master as is appropriate for the the local node type.
# It does this by inspecting the state of the ?Broker::stores? table,
# which a user configures in advance via redef.
# (I have pseudo-code written, let me know if you want to see it all).
global Broker::InitStore: function(name: string): opaque of Broker::Store;

Script-Author Example Usage
---------------------------

# Script author that wants to utilize data stores doesn't have to be aware of
# whether user is running a cluster or if they want to use persistent storage
# backends.

const Software::tracked_store_name = "bro/framework/software/tracked" &redef;

global Software::tracked_store: opaque of Broker::Store;

event bro_init() &priority = +10
  {
  Software::tracked_store = Broker::InitStore(Software::tracked_store_name);
  }

Bro-User Example Usage
----------------------

# User needs to be able to choose data store backends and which cluster node the
# the master store lives on.  They can either do this manually, or BroControl
# will autogenerate the following in cluster-layout.bro:

# Explicitly configure an individual store.
redef Broker::stores += {
  ["bro/framework/software/tracked"] = [$master_node = "some_node",
                                        $backend=Broker::SQLITE,
                                        $options=Broker::BackendOptions(
                                          $sqlite=Broker::SQLiteOptions(
                                            $path="/home/jon/tracked_software.sqlite"))];
};

# Or set new default configurations for stores.
redef Broker::default_master_node = "manager";
redef Broker::default_backend = Broker::MEMORY;
redef Broker::default_store_dir = "/home/jon/stores";

# Then Broker::InitStore() will end up creating the right type of store.

BroControl Example Usage
------------------------

BroControl users will have a new ?datastore.cfg" file they may customize:

# The default file will contain a just a basic [default] section
# and would set up all data stores on the manager node, using the default
# backend (in-memory).  If a user wants to globally change to persistent
# storage and also give a canonical storage node, they can do that here.

[default]
master = manager
backend = MEMORY
# When using persistent backends as default, need to specify a directory to
# store databases in.  Files will be auto-named based on the store's name.
dir = /home/jon/stores

# If a user has special needs regarding persistence/residence, they can
# further customize individual stores:
[bro/framework/software/tracked]
master = some_node
backend = SQLITE
path = /home/jon/tracked_software.sqlite


From jazoff at illinois.edu  Fri Oct  6 10:40:11 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Fri, 6 Oct 2017 17:40:11 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
Message-ID: <DF52C311-1E06-4F8A-A4E9-83E86DBAA0A6@illinois.edu>


> On Oct 6, 2017, at 12:53 PM, Siwek, Jon <jsiwek at illinois.edu> wrote:
> 
> I want to check if there?s any feedback on the approach I?m planning to take when porting over Bro?s scripts to use Broker.  There?s two major areas to consider: (1) how users specify network topology e.g. either for traditional cluster configuration or manually connecting Bro instances and (2) replacing &synchronized with Broker?s distributed data storage features.
> 

...

> Then subscriptions and auto-publications still get automatically set up by the cluster framework in bro_init().
> 
> Other Manual/Custom Topologies
> ------------------------------
> 
> I don?t see anything to do here as the Broker API already has enough to set up peerings and subscriptions in arbitrary ways.  The old ?communication? framework scripts can just go away as most of its functions have direct corollaries in the new ?broker? framework.
> 
> The one thing that is missing is the ?Communication::nodes? table which acts as both a state-tracking structure and an API that users may use to have the comm. framework automatically set up connections between the nodes in the table.  I find this redundant ? there?s two APIs to accomplish the same thing, with the table being an additional layer of indirection to the actual connect/listen functions a user can just as easily use themselves.  I also think it?s not useful for state-tracking as a user operating at the level of this use-case is can easily track nodes themselves or has some other notion of the state structures they need to track that is more intuitive for the particular problem they're solving.  Unless there?s arguments or I find it?s actually needed, I don?t plan to port this to Broker.
> 

I had some feedback related to this sort of thing earlier in the year:

http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2017-February/012386.html
http://mailman.icsi.berkeley.edu/pipermail/bro-dev/2017-March/012411.html

I got send_event_hashed to work via a bit of a hack (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro),
but it needs support from inside broker or at least the bro/broker integration to work properly in the case of node failure.

My ultimate vision is a cluster with 2+ physical datanode/manager/logger boxes where one box can fail and the cluster will continue to function perfectly.
The only thing this requires is a send_event_hashed function that does consistent ring hashing and is aware of node failure.

For things that don't need necessarily need consistent partitioning - like maybe logs if you were using Kafka, a way to designate that a topic should be distributed round-robin between subscribers would be useful too.


? 
Justin Azoff


From jmellander at lbl.gov  Fri Oct  6 14:59:51 2017
From: jmellander at lbl.gov (Jim Mellander)
Date: Fri, 6 Oct 2017 14:59:51 -0700
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
Message-ID: <CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>

I particularly like the idea of an allocation pool that per-packet
information can be stored, and reused by the next packet.

There also are probably some optimizations of frequent operations now that
we're in a 64-bit world that could prove useful - the one's complement
checksum calculation in net_util.cc is one that comes to mind, especially
since it works effectively a byte at a time (and works with even byte
counts only).  Seeing as this is done per-packet on all tcp payload,
optimizing this seems reasonable.  Here's a discussion of do the checksum
calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ -
this website also has an x64 allocator that is claimed to be faster than
tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note: I
haven't tried anything from this source, but find it interesting).

I'm guessing there are a number of such "small" optimizations that could
provide significant performance gains.

Take care,

Jim


On Fri, Oct 6, 2017 at 7:26 AM, Azoff, Justin S <jazoff at illinois.edu> wrote:

>
> > On Oct 6, 2017, at 12:10 AM, Clark, Gilbert <gc355804 at ohio.edu> wrote:
> >
> > I'll note that one of the challenges with profiling is that there are
> the bro scripts, and then there is the bro engine.  The scripting layer has
> a completely different set of optimizations that might make sense than the
> engine does: turning off / turning on / tweaking different scripts can have
> a huge impact on Bro's relative performance depending on the frequency with
> which those script fragments are executed.  Thus, one way to look at
> speeding things up might be to take a look at the scripts that are run most
> often and seeing about ways to accelerate core pieces of them ... possibly
> by moving pieces of those scripts to builtins (as C methods).
> >
>
> Re: scripts, I have some code I put together to do arbitrary benchmarks of
> templated bro scripts.  I need to clean it up and publish it, but I found
> some interesting things.  Function calls are relatively slow.. so things
> like
>
>     ip in Site::local_nets
>
> Is faster than calling
>
>     Site::is_local_addr(ip);
>
> inlining short functions could speed things up a bit.
>
> I also found that things like
>
>     port == 22/tcp || port == 3389/tcp
>
> Is faster than checking if port in {22/tcp,3389/tcp}.. up to about 10
> ports.. Having the hash class fallback to a linear search when the hash
> only contains few items could speed things up there.  Things like
> 'likely_server_ports' have 1 or 2 ports in most cases.
>
>
> > If I had to guess at one engine-related thing that would've sped things
> up when I was profiling this stuff back in the day, it'd probably be
> rebuilding the memory allocation strategy / management.  From what I
> remember, Bro does do some malloc / free in the data path, which hurts
> quite a bit when one is trying to make things go fast.  It also means that
> the selection of a memory allocator and NUMA / per-node memory management
> is going to be important.  That's probably not going to qualify as
> something *small*, though ...
>
> Ah!  This reminds me of something I was thinking about a few weeks ago.
> I'm not sure to what extent bro uses memory allocation pools/interning for
> common immutable data structures.  Like for port objects or small strings.
> There's no reason bro should be mallocing/freeing memory to create port
> objects when they are only 65536 times 2 (or 3?) port objects... but bro
> does things like
>
>         tcp_hdr->Assign(0, new PortVal(ntohs(tp->th_sport),
> TRANSPORT_TCP));
>         tcp_hdr->Assign(1, new PortVal(ntohs(tp->th_dport),
> TRANSPORT_TCP));
>
> For every packet.  As well as allocating a ton of TYPE_COUNT vals for
> things like packet sizes and header lengths.. which will almost always be
> between 0 and 64k.
>
> For things that can't be interned, like ipv6 address, having an allocation
> pool could speed things up... Instead of freeing things like IPAddr objects
> they could just be returned to a pool, and then when a new IPAddr object is
> needed, an already initialized object could be grabbed from the pool and
> 'refreshed' with the new value.
>
> https://golang.org/pkg/sync/#Pool
>
> Talks about that sort of thing.
>
> > On a related note, a fun experiment is always to try running bro with a
> different allocator and seeing what happens ...
>
> I recently noticed our boxes were using jemalloc instead of tcmalloc..
> Switching that caused malloc to drop a few places down in 'perf top' output.
>
>
> ?
> Justin Azoff
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171006/ff1037c9/attachment.html 

From jazoff at illinois.edu  Fri Oct  6 15:49:30 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Fri, 6 Oct 2017 22:49:30 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
Message-ID: <E7E6A4FC-50CC-4471-9823-C0B784F875EF@illinois.edu>


> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> 
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.
> 
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only).  Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable.  Here's a discussion of do the checksum calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ - this website also has an x64 allocator that is claimed to be faster than tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note: I haven't tried anything from this source, but find it interesting).
> 
> I'm guessing there are a number of such "small" optimizations that could provide significant performance gains.
> 
> Take care,
> 
> Jim

I've been messing around with 'perf top', the one's complement function often shows up fairly high up.. that, PriorityQueue::BubbleDown, and BaseList::remove

Something (on our configuration?) is doing a lot of PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class before in bro.. I think a script may be triggering something that is hurting performance.  I can't think of what it would be though.

Running perf top on a random worker right now with -F 19999 shows:

Samples: 485K of event 'cycles', Event count (approx.): 26046568975
Overhead  Shared Object                 Symbol
  34.64%  bro                           [.] BaseList::remove
   3.32%  libtcmalloc.so.4.2.6          [.] operator delete
   3.25%  bro                           [.] PriorityQueue::BubbleDown
   2.31%  bro                           [.] BaseList::remove_nth
   2.05%  libtcmalloc.so.4.2.6          [.] operator new
   1.90%  bro                           [.] Attributes::FindAttr
   1.41%  bro                           [.] Dictionary::NextEntry
   1.27%  libc-2.17.so                  [.] __memcpy_ssse3_back
   0.97%  bro                           [.] StmtList::Exec
   0.87%  bro                           [.] Dictionary::Lookup
   0.85%  bro                           [.] NameExpr::Eval
   0.84%  bro                           [.] BroFunc::Call
   0.80%  libtcmalloc.so.4.2.6          [.] tc_free
   0.77%  libtcmalloc.so.4.2.6          [.] operator delete[]
   0.70%  bro                           [.] ones_complement_checksum
   0.60%  libtcmalloc.so.4.2.6          [.] tcmalloc::ThreadCache::ReleaseToCentralCache
   0.60%  bro                           [.] RecordVal::RecordVal
   0.53%  bro                           [.] UnaryExpr::Eval
   0.51%  bro                           [.] ExprStmt::Exec
   0.51%  bro                           [.] iosource::Manager::FindSoonest
   0.50%  libtcmalloc.so.4.2.6          [.] operator new[]


Which sums up to 59.2%

BaseList::remove/BaseList::remove_nth seems particularly easy to optimize. Can't that loop be replaced by a memmove?
I think something may be broken if it's being called that much though.


? 
Justin Azoff


From robin at icir.org  Fri Oct  6 15:58:46 2017
From: robin at icir.org (Robin Sommer)
Date: Fri, 6 Oct 2017 15:58:46 -0700
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
Message-ID: <20171006225846.GC83573@icir.org>

Nice!

On Fri, Oct 06, 2017 at 16:53 +0000, you wrote:

> 	# contains topic prefixes
> 	const Cluster::manager_subscriptions: set[string] &redef;
> 
> 	# contains (topic string, event name) pairs
> 	const Cluster::manager_publications: set[string, string] &redef;

I'm wondering if we can simplify this with Broker. With the old comm
system we needed the event names because that's what was subscribed
to. Now that we have topics, does the cluster framework still need to
know about the events at all? I'm thinking we could just go with a
topic convention and then the various scripts would publish there
directly.

In the most simple version of this, the cluster framework would just
hard-code a subscription to "bro/cluster/". And then scripts like the
Intel framework would just publish all their events to "bro/cluster/"
directly through Broker.

To allow for distinguishing by node type we can define separate topic
hierarchies: "bro/cluster/{manager,worker,logger}/". Each node
subscribes to the hierarchy corresponding to its type, and each script
publishes according to where it wants to send events to (again
directly using the Broker API).

I think we could fit in Justin's hashing here too: We add per node
topics as well ("bro/cluster/node/worker-1/",
"bro/cluster/node/worker-2/", etc.) and then the cluster framework can
provide a function that maps a hash key to a topic that corresponds to
currently active node:

    local topic = Cluster:topic_for_key("abcdef"); # Returns, e.g., "bro/cluster/node/worker-1"
    Broker::publish(topic, event);

And that scheme may suggest that instead of hard-coding topics on the
sender side, the Cluster framework could generally provide a set of
functions to retrieve the right topic:

    # In SumStats framework:
    local topic = Cluster::topic_for_manager() # Returns "bro/cluster/manager".
    Broker::public(topic, event);

Bottom-line: If we can find a way to steer information by setting up
topics appropriately, we might not need much additional configuration
at all.

>   The old ?communication? framework scripts can just go away as most
>   of its functions have direct corollaries in the new ?broker?
>   framework.

Yep, agree.

> The one thing that is missing is the ?Communication::nodes? table

Agree that it doesn't look useful from an API perspective. The Broker
framework may eventually need an equivalent table internally if we
want to offer robustness mechanisms like Justin's hashing.

> Broker Framework API
> --------------------

I'm wondering if these store operations should become part of the
Cluster framework instead. If we added them to the Broker framework,
we'd have two separate store APIs there: one low-level version mapping
directly to the C++ Broker API, and one higher-level that configures
things like location of the DB files. That could be confusing.

>   Software::tracked_store = Broker::InitStore(Software::tracked_store_name);

I like this. One additional idea: while I see that it's generally the
user who wants to configure which backend to use, the script author
may know already if it's data that should be persistent across
execution; I'm guessing that's usually implied by the script's
semantics. We could give InitStore() an additional boolean
"persistent" to indicate that. If that's true, it'd use the
"default_backend" (or now maybe "default_db_backend"); if false, it'd
always use the MEMORY backend.

> # User needs to be able to choose data store backends and which cluster node the
> # the master store lives on.  They can either do this manually, or BroControl
> # will autogenerate the following in cluster-layout.bro:

I don't really like the idea of autogenerating this, as it's pretty
complex information. Usually, the Broker::default_* values should be
fine, right? For the few cases where one wants to tweak that on a
per-store bassis, using a manual redef on the table sounds fine to me.

Hmm, actually, what would you think about using functions instead of
tables? We could model this similar to how the logging framework does
filters: there's a default filter installed, but you can retrieve and
update it. Here there'd be a default StoreInfo, which one can update.

> redef Broker::default_master_node = "manager";
> redef Broker::default_backend = Broker::MEMORY;
> redef Broker::default_store_dir = "/home/jon/stores";

Can the default_store_dir be set to some standard location through
BroControl? Would be neat if this all just worked in the standard case
without any custom configuration at all.

> BroControl Example Usage
> ------------------------

I'll skip commenting on this and wait for your response to the above
first, as I'm wondering if we need this BroControl functionality at
all.

Robin

-- 
Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin

From jmellander at lbl.gov  Fri Oct  6 17:00:09 2017
From: jmellander at lbl.gov (Jim Mellander)
Date: Fri, 6 Oct 2017 17:00:09 -0700
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <E7E6A4FC-50CC-4471-9823-C0B784F875EF@illinois.edu>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
	<E7E6A4FC-50CC-4471-9823-C0B784F875EF@illinois.edu>
Message-ID: <CADju=b6kfC+zMBcqup5mqQkq+JRp5haHG_NNf=+xKR57qcaoFg@mail.gmail.com>

Interesting info.  The > order of magnitude difference in time between
BaseList::remove & BaseList::removenth suggests the possibility that the
for loop in BaseList::remove is falling off the end in many cases (i.e.
attempting to remove an item that doesn't exist).  Maybe thats whats broken.


On Fri, Oct 6, 2017 at 3:49 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:

>
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> >
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
> >
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only).  Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable.  Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> - this website also has an x64 allocator that is claimed to be faster than
> tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note:
> I haven't tried anything from this source, but find it interesting).
> >
> > I'm guessing there are a number of such "small" optimizations that could
> provide significant performance gains.
> >
> > Take care,
> >
> > Jim
>
> I've been messing around with 'perf top', the one's complement function
> often shows up fairly high up.. that, PriorityQueue::BubbleDown, and
> BaseList::remove
>
> Something (on our configuration?) is doing a lot of
> PQ_TimerMgr::~PQ_TimerMgr... I don't think I've come across that class
> before in bro.. I think a script may be triggering something that is
> hurting performance.  I can't think of what it would be though.
>
> Running perf top on a random worker right now with -F 19999 shows:
>
> Samples: 485K of event 'cycles', Event count (approx.): 26046568975
> Overhead  Shared Object                 Symbol
>   34.64%  bro                           [.] BaseList::remove
>    3.32%  libtcmalloc.so.4.2.6          [.] operator delete
>    3.25%  bro                           [.] PriorityQueue::BubbleDown
>    2.31%  bro                           [.] BaseList::remove_nth
>    2.05%  libtcmalloc.so.4.2.6          [.] operator new
>    1.90%  bro                           [.] Attributes::FindAttr
>    1.41%  bro                           [.] Dictionary::NextEntry
>    1.27%  libc-2.17.so                  [.] __memcpy_ssse3_back
>    0.97%  bro                           [.] StmtList::Exec
>    0.87%  bro                           [.] Dictionary::Lookup
>    0.85%  bro                           [.] NameExpr::Eval
>    0.84%  bro                           [.] BroFunc::Call
>    0.80%  libtcmalloc.so.4.2.6          [.] tc_free
>    0.77%  libtcmalloc.so.4.2.6          [.] operator delete[]
>    0.70%  bro                           [.] ones_complement_checksum
>    0.60%  libtcmalloc.so.4.2.6          [.] tcmalloc::ThreadCache::
> ReleaseToCentralCache
>    0.60%  bro                           [.] RecordVal::RecordVal
>    0.53%  bro                           [.] UnaryExpr::Eval
>    0.51%  bro                           [.] ExprStmt::Exec
>    0.51%  bro                           [.] iosource::Manager::FindSoonest
>    0.50%  libtcmalloc.so.4.2.6          [.] operator new[]
>
>
> Which sums up to 59.2%
>
> BaseList::remove/BaseList::remove_nth seems particularly easy to
> optimize. Can't that loop be replaced by a memmove?
> I think something may be broken if it's being called that much though.
>
>
>
> ?
> Justin Azoff
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171006/0aa23a85/attachment.html 

From jazoff at illinois.edu  Mon Oct  9 07:19:26 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Mon, 9 Oct 2017 14:19:26 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
Message-ID: <30CA959A-E44B-4CA3-8281-B3208DEDB47B@illinois.edu>


> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> 
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.

Turns out bro does this most of the time.. unless you use the next_packet event.  Normal connections use the sessions cache which holds connection objects, but new_packet has its own code path that creates the ip header from scratch for each packet.  I tried to pre-allocate PortVal objects, but I think I was screwing something up with 'Ref' and bro would just segfault on the 2nd connection.


> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only).  Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable.  Here's a discussion of do the checksum calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ - this website also has an x64 allocator that is claimed to be faster than tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note: I haven't tried anything from this source, but find it interesting).

I couldn't get this code to return the right checksums inside bro (some casting issue?), but if it is faster it should increase performance by a small percentage.  Comparing 'bro -b' runs on a pcap with 'bro -b -C' runs (which should show what kind of performance increase we would get if that function took 0s to run) shows a decent chunk of time taken computing checksums.

> I'm guessing there are a number of such "small" optimizations that could provide significant performance gains.

I've been trying to figure out the best way to profile bro.  So far attempting to use linux perf, or google perftools hasn't been able to shed much light on anything.  I think the approach I was using to benchmark certain operations in the bro language is the better approach.

Instead of running bro and trying to profile it to figure out what is causing the most load, simply compare the execution of two bro runs with slightly different scripts/settings.  I think this will end up being the better approach because it answers real questions like "If I load this script or change this setting what is the performance impact on the bro process".  When I did this last I used this method to compare the performance from one bro commit to the next, but I never tried comparing bro with one set of scripts loaded to bro with a different set of scripts loaded.

For example, the simplest and most dramatic test I came up with so far:

$ time bro -r 2009-M57-day11-18.trace -b
real	0m2.434s
user	0m2.236s
sys	0m0.200s

$ cat np.bro
event new_packet(c: connection, p: pkt_hdr)
{

}

$ time bro -r 2009-M57-day11-18.trace -b np.bro
real	0m10.588s
user	0m10.392s
sys	0m0.204s

We've been saying for a while that adding that event is expensive, but I don't know if it's even been quantified.

The main thing I still need to figure out is how to do this type of test in a cluster environment while replaying a long pcap.


Somewhat related, came across this presentation yesterday:

https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be

CppCon 2017: Carl Cook ?When a Microsecond Is an Eternity: High Performance Trading Systems in C++?

Among other things, he mentions using a memory pool for objects instead of creating/deleting them.


? 
Justin Azoff


From jsiwek at illinois.edu  Mon Oct  9 11:08:47 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Mon, 9 Oct 2017 18:08:47 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <DF52C311-1E06-4F8A-A4E9-83E86DBAA0A6@illinois.edu>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<DF52C311-1E06-4F8A-A4E9-83E86DBAA0A6@illinois.edu>
Message-ID: <EB7BAFE0-E8B9-4BCA-A763-93DD9FF2B75F@illinois.edu>


> I got send_event_hashed to work via a bit of a hack (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro),
> but it needs support from inside broker or at least the bro/broker integration to work properly in the case of node failure.
> 
> My ultimate vision is a cluster with 2+ physical datanode/manager/logger boxes where one box can fail and the cluster will continue to function perfectly.
> The only thing this requires is a send_event_hashed function that does consistent ring hashing and is aware of node failure.

Yeah, that sounds like a good idea that I can try to work into the design.  What is a ?data node? though?  We don?t currently have that?

More broadly, it sounds like a user needs a way to specify which nodes they want to belong to a worker pool, do you still imagine that is done like you had in the example broctl.cfg from the earlier thread?  Do you need to be able to specify more than one type of pool?

> For things that don't need necessarily need consistent partitioning - like maybe logs if you were using Kafka, a way to designate that a topic should be distributed round-robin between subscribers would be useful too.

Yeah, that seems like it would require pretty much the same set of functionality to get working and then user can just specify a different function to use for distributing events (e.g. hash vs. round-robin).

- Jon


From jazoff at illinois.edu  Mon Oct  9 11:46:30 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Mon, 9 Oct 2017 18:46:30 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <EB7BAFE0-E8B9-4BCA-A763-93DD9FF2B75F@illinois.edu>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<DF52C311-1E06-4F8A-A4E9-83E86DBAA0A6@illinois.edu>
	<EB7BAFE0-E8B9-4BCA-A763-93DD9FF2B75F@illinois.edu>
Message-ID: <A3879B16-7F17-4BC2-96BD-E6DC90475634@illinois.edu>


> On Oct 9, 2017, at 2:08 PM, Siwek, Jon <jsiwek at illinois.edu> wrote:
> 
> 
>> I got send_event_hashed to work via a bit of a hack (https://github.com/JustinAzoff/broker_distributed_events/blob/master/distributed_broker.bro),
>> but it needs support from inside broker or at least the bro/broker integration to work properly in the case of node failure.
>> 
>> My ultimate vision is a cluster with 2+ physical datanode/manager/logger boxes where one box can fail and the cluster will continue to function perfectly.
>> The only thing this requires is a send_event_hashed function that does consistent ring hashing and is aware of node failure.
> 
> Yeah, that sounds like a good idea that I can try to work into the design.  What is a ?data node? though?  We don?t currently have that?

We did at one point, see

topic/seth/broker-merge / topic/mfischer/broker-integration

The data node replaced the proxies and did stuff related to broker data stores.

I think the idea was that a data node process would own the broker data store.

My usage of data nodes was for scaling out data aggregation, I never did anything with the data stores.  The data nodes were just a place to stream scan attempts to for aggregation.

> More broadly, it sounds like a user needs a way to specify which nodes they want to belong to a worker pool, do you still imagine that is done like you had in the example broctl.cfg from the earlier thread?  Do you need to be able to specify more than one type of pool?

People have asked for this now as solution for fixing an overloaded manager process, but if we get load balancing/failover working as well as QoS/priorities there may not be a point into statically configuring things like that.. like someone might want to do

# a node for tracking spam
[spam]
type = data/spam

# a node for sumstats
[sumstats]
type = data/sumstats

# a node for known hosts/certs/etc tracking
[known]
Type = data/known

But I think just having the ability to do

[data]
type = data
lb_procs = 6

This would work better for everyone.  Sending one type of data to one type of data node is still going to eventually overload a single process.

>> For things that don't need necessarily need consistent partitioning - like maybe logs if you were using Kafka, a way to designate that a topic should be distributed round-robin between subscribers would be useful too.
> 
> Yeah, that seems like it would require pretty much the same set of functionality to get working and then user can just specify a different function to use for distributing events (e.g. hash vs. round-robin).
> 
> - Jon

Great!  Right now broctl configures this in a 'round-robin' type way by assigning every other worker to a different logger node.  With support for this in broker it could just connect every worker to every logger process and broker could handle the load balancing/failover.


? 
Justin Azoff


From jsiwek at illinois.edu  Mon Oct  9 12:33:54 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Mon, 9 Oct 2017 19:33:54 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <20171006225846.GC83573@icir.org>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<20171006225846.GC83573@icir.org>
Message-ID: <9F565953-D1F7-492D-8E68-137021633C4C@illinois.edu>


> On Oct 6, 2017, at 5:58 PM, Robin Sommer <robin at icir.org> wrote:
> 
> In the most simple version of this, the cluster framework would just
> hard-code a subscription to "bro/cluster/". And then scripts like the
> Intel framework would just publish all their events to "bro/cluster/"
> directly through Broker.
> 
> To allow for distinguishing by node type we can define separate topic
> hierarchies: "bro/cluster/{manager,worker,logger}/". Each node
> subscribes to the hierarchy corresponding to its type, and each script
> publishes according to where it wants to send events to (again
> directly using the Broker API).

Yeah, that could be a better way to approach it, thanks.  I?ll try to go back and rework the design around that topic hierarchy/naming convention (that was the part I was most unsure about).

> 
>> Broker Framework API
>> --------------------
> 
> I'm wondering if these store operations should become part of the
> Cluster framework instead. If we added them to the Broker framework,
> we'd have two separate store APIs there: one low-level version mapping
> directly to the C++ Broker API, and one higher-level that configures
> things like location of the DB files. That could be confusing.

Yeah could be. I?ll try moving more stuff into Cluster and see if it still makes sense to me.

> I like this. One additional idea: while I see that it's generally the
> user who wants to configure which backend to use, the script author
> may know already if it's data that should be persistent across
> execution; I'm guessing that's usually implied by the script's
> semantics. We could give InitStore() an additional boolean
> "persistent" to indicate that.

Ack.

>> # User needs to be able to choose data store backends and which cluster node the
>> # the master store lives on.  They can either do this manually, or BroControl
>> # will autogenerate the following in cluster-layout.bro:
> 
> I don't really like the idea of autogenerating this, as it's pretty
> complex information. Usually, the Broker::default_* values should be
> fine, right? For the few cases where one wants to tweak that on a
> per-store bassis, using a manual redef on the table sounds fine to me.

It?s just a matter of where you expect most users to feel comfortable making customizations: in Bro scripts or in a broctl config file.

I think it?s fine to first assume it won?t be needed often and so only provide the customization via Bro scripts directly.  If we learn later that it?s a pain point for users, it?s easy add the "simpler" config file interface via broctl to help autogenerate it.

> Hmm, actually, what would you think about using functions instead of
> tables? We could model this similar to how the logging framework does
> filters: there's a default filter installed, but you can retrieve and
> update it. Here there'd be a default StoreInfo, which one can update.

I think I went with the ?redef? interface first because it?s impossible for a user to screw up order of operations there, where with functions you can (technically) have some &priority mishaps on bro_init() since the InitStore() function is also going to be running in bro_init().

Maybe the key point is that these customizations only make sense to happen once before init time?  i.e. a function would imply calling it anytime at runtime could yield a useful result, but at the moment, we?re not allowing changing a store?s backend or master node dynamically at runtime, just once before bro_init().  So if you think that?s something to anticipate in the future, I?d agree that just using functions from the start would be better.

>> redef Broker::default_store_dir = "/home/jon/stores";
> 
> Can the default_store_dir be set to some standard location through
> BroControl? Would be neat if this all just worked in the standard case
> without any custom configuration at all.

Yeah, should be possible, I think I had just given a random example in the above.

- Jon


From jmellander at lbl.gov  Mon Oct  9 13:57:54 2017
From: jmellander at lbl.gov (Jim Mellander)
Date: Mon, 9 Oct 2017 13:57:54 -0700
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CY4PR01MB2360A4D5E1239FEE7C19A762AA740@CY4PR01MB2360.prod.exchangelabs.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
	<30CA959A-E44B-4CA3-8281-B3208DEDB47B@illinois.edu>
	<CY4PR01MB2360A4D5E1239FEE7C19A762AA740@CY4PR01MB2360.prod.exchangelabs.com>
Message-ID: <CADju=b5SCwtZRM=-RbX_Psaxsvz9N8wCZv83USFba4+jtD+k7w@mail.gmail.com>

Well, I found pathological behavior with BaseList::remove

Instrumenting it with a printf of num_entries & i after the for loop,
running against a test pcap then summarizing with awk gives:

Count, num_entries, i

1 3583 3536
1 3584 3537
1 3623 3542
1 3624 3543
1 3628 3620
1 3629 3621
1 3636 3562
1 3636 3625
1 3637 3563
1 3637 3626
1 3644 3576
1 3644 3641
1 3645 3577
1 3645 3642
1 3647 3641
1 3648 3642
1 3650 3629
1 3651 3630
1 3658 3647
1 3659 3648
1 3663 3655
1 3664 3656
1 3673 3629
1 3674 3630
1 3697 3686
1 3698 3687
1 3981 3595
1 3982 3596
1 4372 3978
1 4373 3979
1 4374 3656
1 4374 4371
1 4375 3657
1 4375 4372
1 4554 4371
1 4555 4372
1 4571 4367
1 4571 4551
1 4572 4368
1 4572 4552
1 4968 4566
1 4969 4567
1 5058 4566
1 5059 4567
1 5160 4963
1 5161 4964
1 5258 5157
1 5259 5158
1 5342 4566
1 5343 4567
1 5353 5253
1 5354 5254
1 5356 3638
1 5356 5337
1 5356 5350
1 5356 5351
1 5356 5353
1 5357 3639
1 5357 5338
1 5357 5351
1 5357 5352
1 5357 5354
1 5367 5351
1 5368 5352
1 5369 4556
1 5370 4557
1 5374 3675
1 5374 5366
1 5375 3676
1 5375 5367
1 5379 3664
1 5379 5045
1 5380 3665
1 5380 5046
1 5384 3601
1 5385 3602
1 5386 5354
1 5387 5355
1 5392 5370
1 5393 5371
1 5404 5363
1 5404 5381
1 5405 5364
1 5405 5382
1 5408 5341
1 5408 5368
1 5408 5399
1 5409 5342
1 5409 5369
1 5409 5400
1 5413 5401
1 5413 5403
1 5414 5402
1 5414 5404
1 5416 5408
1 5417 5409
1 5429 5395
1 5430 5396
1 5439 5381
1 5439 5406
1 5440 5382
1 5440 5407
1 5460 5436
1 5461 5437
1 5463 5407
1 5464 5408
1 5465 5397
1 5465 5460
1 5466 5398
1 5466 5461
1 5474 5359
1 5474 5451
1 5474 5456
1 5474 5471
1 5475 5360
1 5475 5452
1 5475 5457
1 5475 5472
1 5479 5456
1 5479 5476
1 5480 5457
1 5480 5477
1 5481 5416
1 5482 5417
1 5493 5426
1 5493 5474
1 5493 5488
1 5494 5427
1 5494 5475
1 5494 5489
1 5497 5357
1 5497 5367
1 5497 5461
1 5497 5462
1 5497 5480
1 5497 5488
1 5498 5358
1 5498 5368
1 5498 5462
1 5498 5463
1 5498 5481
1 5498 5489
1 5499 3682
1 5499 5460
1 5499 5476
1 5499 5478
1 5499 5480
1 5500 3683
1 5500 5461
1 5500 5477
1 5500 5479
1 5500 5481
2 3612 3609
2 3613 3610
2 3689 3686
2 3690 3687
2 3697 3694
2 3698 3695
2 5374 5371
2 5375 5372
2 5384 5381
2 5385 5382
2 5463 5460
2 5464 5461
2 5493 5465
2 5494 5466
2 5497 5484
2 5498 5485
2 5499 5482
2 5499 5488
2 5499 5490
2 5499 5492
2 5499 5494
2 5500 5483
2 5500 5489
2 5500 5491
2 5500 5493
2 5500 5495
3 4571 4568
3 4572 4569
3 5493 5490
3 5494 5491
3 5497 5490
3 5497 5492
3 5498 5491
3 5498 5493
3 5499 5486
3 5500 5487
4 3647 3644
4 3648 3645
5 5379 5376
5 5380 5377
7 5499 5496
7 5500 5497
10 5497 5494
10 5498 5495
26 3 2
5861 3 1
13714 4 4
14130 4 1
34914 4 0
74299 3 3
1518194 2 1
2648755 2 2
8166358 3 0
13019625 2 0
62953139 0 0
71512938 1 1
104294506 1 0

there are 286 instances where the list has over 3000 entries, and the
desired entry is near the end...  That linear search has got to be killing
performance, even though its uncommon  :-(

The case of num_entries being 0 can be optimized a bit, but is relatively
minor.

Now, I'll see if I can identify the offending List

Jim


On Mon, Oct 9, 2017 at 1:03 PM, Clark, Gilbert <gc355804 at ohio.edu> wrote:

> If you look in one of the subdirectories or another, in ages past there
> was a little shell script to incrementally execute bro against a specific
> trace, loading one script at a time to see the effect each of them had on
> the overall runtime.  I can't remember what it was called offhand, but it
> was useful for quick and dirty testing.
>
>
> And yes, function calls in bro script land are pretty horrific in terms of
> overhead.  Line per line, bro script in general isn't terribly efficient
> just because it doesn't do any of the things a modern interpreter might
> (e.g. just-in-time compilation).  That's not a criticism, it's only a note
> - most folks rely on horizontal scaling to deal with faster ingress, which
> I think makes a whole lot of sense.
>
>
> Just in the event it's useful, I've attached some profiling I did on the
> script function overhead with the crap I wrote: these are some graphs on
> which script functions call which other script functions, how many times
> that happens, and how much time is spent in each function.
>
>
> avggraph is the average time spent per-call, and resgraph is the aggregate
> time spent in each function across the entire run.  The numbers' formatting
> needed some fixing, but never made it that far ...
>
>
> I know Robin et. al. were working on different approaches for
> next-generation scripting kind of stuff, but haven't kept up well enough to
> really know where those are.
>
>
> One thing I played around with was using plugin hooks to integrate other
> scripting languages into the bro fast path (luajit was my weapon of choice)
> and seeing if conversion from bro script to one of those other languages
> might improve the run time.  Other languages would still be less efficient
> than C, and anything garbage collected would need to be *really* carefully
> used, but ... it struck me as an idea that might be worth a look :)
>
>
> And yeah, generally speaking, most of the toolkits I've played with for
> software-based packet processing absolutely do use memory pools for the
> fast path.  They also use burst fetch tricks (to amortize the cost of
> fetching packets over X packets, rather than fetching one packet at a
> time), and I've also seen quite a bit of prefetch / SIMD to try to keep
> things moving quickly once the packets make it to the CPU.
>
>
> Things start to get pretty crazy as packet rates increase, though: once
> you hit about 10 - 15 Gbps, even a *cache miss* on a modern system is
> enough to force a drop ...
>
>
> For what it's worth ...
>
>
> -Gilbert
>
>
> ------------------------------
> *From:* Azoff, Justin S <jazoff at illinois.edu>
> *Sent:* Monday, October 9, 2017 10:19:26 AM
> *To:* Jim Mellander
> *Cc:* Clark, Gilbert; bro-dev at bro.org
> *Subject:* Re: [Bro-Dev] Performance Enhancements
>
>
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> >
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
>
> Turns out bro does this most of the time.. unless you use the next_packet
> event.  Normal connections use the sessions cache which holds connection
> objects, but new_packet has its own code path that creates the ip header
> from scratch for each packet.  I tried to pre-allocate PortVal objects, but
> I think I was screwing something up with 'Ref' and bro would just segfault
> on the 2nd connection.
>
>
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only).  Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable.  Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> - this website also has an x64 allocator that is claimed to be faster than
> tcmalloc, see: https://locklessinc.com/benchmarks_allocator.shtml  (note:
> I haven't tried anything from this source, but find it interesting).
> The TCP/IP Checksum - Lockless Inc
> <https://locklessinc.com/articles/tcp_checksum/>
> locklessinc.com
> The above obvious algorithm handles 16-bits at a time. Usually, if you
> process more data at a time, then performance is better. Since we have a
> 64-bit machine, we ...
>
> Benchmarks of the Lockless Memory Allocator
> <https://locklessinc.com/benchmarks_allocator.shtml>
> locklessinc.com
> The speed of various memory allocators is compared.
>
>
>
> I couldn't get this code to return the right checksums inside bro (some
> casting issue?), but if it is faster it should increase performance by a
> small percentage.  Comparing 'bro -b' runs on a pcap with 'bro -b -C' runs
> (which should show what kind of performance increase we would get if that
> function took 0s to run) shows a decent chunk of time taken computing
> checksums.
>
> > I'm guessing there are a number of such "small" optimizations that could
> provide significant performance gains.
>
> I've been trying to figure out the best way to profile bro.  So far
> attempting to use linux perf, or google perftools hasn't been able to shed
> much light on anything.  I think the approach I was using to benchmark
> certain operations in the bro language is the better approach.
>
> Instead of running bro and trying to profile it to figure out what is
> causing the most load, simply compare the execution of two bro runs with
> slightly different scripts/settings.  I think this will end up being the
> better approach because it answers real questions like "If I load this
> script or change this setting what is the performance impact on the bro
> process".  When I did this last I used this method to compare the
> performance from one bro commit to the next, but I never tried comparing
> bro with one set of scripts loaded to bro with a different set of scripts
> loaded.
>
> For example, the simplest and most dramatic test I came up with so far:
>
> $ time bro -r 2009-M57-day11-18.trace -b
> real    0m2.434s
> user    0m2.236s
> sys     0m0.200s
>
> $ cat np.bro
> event new_packet(c: connection, p: pkt_hdr)
> {
>
> }
>
> $ time bro -r 2009-M57-day11-18.trace -b np.bro
> real    0m10.588s
> user    0m10.392s
> sys     0m0.204s
>
> We've been saying for a while that adding that event is expensive, but I
> don't know if it's even been quantified.
>
> The main thing I still need to figure out is how to do this type of test
> in a cluster environment while replaying a long pcap.
>
>
>
> Somewhat related, came across this presentation yesterday:
>
> https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be
> <https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be>
> CppCon 2017: Carl Cook ?When a Microsecond Is an Eternity: High
> Performance Trading Systems in C++?
> <https://www.youtube.com/watch?v=NH1Tta7purM&feature=youtu.be>
> www.youtube.com
> http://CppCon.org ? Presentation Slides, PDFs, Source Code and other
> presenter materials are available at: https://github.com/CppCon/CppCon2017
> ? Automated t...
>
>
>
> CppCon 2017: Carl Cook ?When a Microsecond Is an Eternity: High
> Performance Trading Systems in C++?
>
> Among other things, he mentions using a memory pool for objects instead of
> creating/deleting them.
>
>
>
> ?
> Justin Azoff
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171009/a6673f2a/attachment-0001.html 

From jazoff at illinois.edu  Mon Oct  9 14:12:39 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Mon, 9 Oct 2017 21:12:39 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CY4PR01MB2360A4D5E1239FEE7C19A762AA740@CY4PR01MB2360.prod.exchangelabs.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
	<30CA959A-E44B-4CA3-8281-B3208DEDB47B@illinois.edu>
	<CY4PR01MB2360A4D5E1239FEE7C19A762AA740@CY4PR01MB2360.prod.exchangelabs.com>
Message-ID: <265366D0-B2B7-4F34-B5D5-28DF58EF6FBE@illinois.edu>


> On Oct 9, 2017, at 4:03 PM, Clark, Gilbert <gc355804 at ohio.edu> wrote:
> 
> If you look in one of the subdirectories or another, in ages past there was a little shell script to incrementally execute bro against a specific trace, loading one script at a time to see the effect each of them had on the overall runtime.  I can't remember what it was called offhand, but it was useful for quick and dirty testing.
> 
> And yes, function calls in bro script land are pretty horrific in terms of overhead.  Line per line, bro script in general isn't terribly efficient just because it doesn't do any of the things a modern interpreter might (e.g. just-in-time compilation).  That's not a criticism, it's only a note - most folks rely on horizontal scaling to deal with faster ingress, which I think makes a whole lot of sense.

>From what my little microbenchmarks have been showing me, bro scripts are mostly fast, but there are some operations that may be slower than they should be.. like for loops or set/table lookups on small or empty tables.

> Just in the event it's useful, I've attached some profiling I did on the script function overhead with the crap I wrote: these are some graphs on which script functions call which other script functions, how many times that happens, and how much time is spent in each function.  
> 
> avggraph is the average time spent per-call, and resgraph is the aggregate time spent in each function across the entire run.  The numbers' formatting needed some fixing, but never made it that far ...

Something like this from an hours worth of bro runtime would be neat.  One potential issue is that if a function is called a lot, it could mean it's either something that needs to be optimized so it is faster, or it could mean it's something that needs to be refactored so it's not called so much.

> I know Robin et. al. were working on different approaches for next-generation scripting kind of stuff, but haven't kept up well enough to really know where those are.

http://www.icir.org/hilti/ I believe.

> One thing I played around with was using plugin hooks to integrate other scripting languages into the bro fast path (luajit was my weapon of choice) and seeing if conversion from bro script to one of those other languages might improve the run time.  Other languages would still be less efficient than C, and anything garbage collected would need to be *really* carefully used, but ... it struck me as an idea that might be worth a look :)

I'm not sure how hard it would be to write a transpiler for bro scripts and convert them completely to something like lua.   Other than maybe ip address and subnets as data types, I think they overlap fairly well.

> And yeah, generally speaking, most of the toolkits I've played with for software-based packet processing absolutely do use memory pools for the fast path.  They also use burst fetch tricks (to amortize the cost of fetching packets over X packets, rather than fetching one packet at a time), and I've also seen quite a bit of prefetch / SIMD to try to keep things moving quickly once the packets make it to the CPU.  
> 
> Things start to get pretty crazy as packet rates increase, though: once you hit about 10 - 15 Gbps, even a *cache miss* on a modern system is enough to force a drop ...

Data rate is just one part of it.. the number of packets per second and new sessions per second has a huge impact as well.  Handling 10Gbps of a few concurrent connections is easy, but 10Gbps of DNS will not work so well.

DNS is one analyzer/script that I think could benefit from being ported to C++.  The script doesn't do much and there aren't many scripts out there that want the lower level dns events.. Having the analyzer merge the request/reply directly and bypass the scripts entirely could boost performance by quite a bit.

? 
Justin Azoff


From jazoff at illinois.edu  Mon Oct  9 14:23:08 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Mon, 9 Oct 2017 21:23:08 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CADju=b5SCwtZRM=-RbX_Psaxsvz9N8wCZv83USFba4+jtD+k7w@mail.gmail.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
	<30CA959A-E44B-4CA3-8281-B3208DEDB47B@illinois.edu>
	<CY4PR01MB2360A4D5E1239FEE7C19A762AA740@CY4PR01MB2360.prod.exchangelabs.com>
	<CADju=b5SCwtZRM=-RbX_Psaxsvz9N8wCZv83USFba4+jtD+k7w@mail.gmail.com>
Message-ID: <47C8DB65-E92B-4995-AA04-57DBFC57D9E0@illinois.edu>


> On Oct 9, 2017, at 4:57 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> 
> Well, I found pathological behavior with BaseList::remove
> 
> Instrumenting it with a printf of num_entries & i after the for loop, running against a test pcap then summarizing with awk gives:
> 
> Count, num_entries, i
> 
> 1 3583 3536
> 1 3584 3537
> 1 3623 3542
> 1 3624 3543
> 1 3628 3620
> ...
> 5 5379 5376
> 5 5380 5377
> 7 5499 5496
> 7 5500 5497
> 10 5497 5494
> 10 5498 5495
> 26 3 2
> 5861 3 1
> 13714 4 4
> 14130 4 1
> 34914 4 0
> 74299 3 3
> 1518194 2 1
> 2648755 2 2
> 8166358 3 0
> 13019625 2 0
> 62953139 0 0
> 71512938 1 1
> 104294506 1 0
> 
> there are 286 instances where the list has over 3000 entries, and the desired entry is near the end...  That linear search has got to be killing performance, even though its uncommon  :-(
> 
> The case of num_entries being 0 can be optimized a bit, but is relatively minor.
> 
> Now, I'll see if I can identify the offending List
> 
> Jim
> 

A for loop over an empty table/set causes the "0 0" entries.  Something related to the "cookies" it uses for iteration.

Not sure what causes the "1 1" cases.

What I also find interesting are the "1 0" entries.  I wonder how many of those cases are followed up by the list itself being destroyed. 

Something allocating and then destroying 104,294,506 lists that only ever have a single item in them.  That's a lot of work for nothing.

? 
Justin Azoff


From johanna at icir.org  Tue Oct 10 12:05:14 2017
From: johanna at icir.org (Johanna Amann)
Date: Tue, 10 Oct 2017 12:05:14 -0700
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <20171006225846.GC83573@icir.org>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<20171006225846.GC83573@icir.org>
Message-ID: <20171010190514.3gtdd5dgci635xno@Beezling.local>

> On Fri, Oct 06, 2017 at 16:53 +0000, you wrote:
> 
> > 	# contains topic prefixes
> > 	const Cluster::manager_subscriptions: set[string] &redef;
> > 
> > 	# contains (topic string, event name) pairs
> > 	const Cluster::manager_publications: set[string, string] &redef;
> 
> I'm wondering if we can simplify this with Broker. With the old comm
> system we needed the event names because that's what was subscribed
> to. Now that we have topics, does the cluster framework still need to
> know about the events at all? I'm thinking we could just go with a
> topic convention and then the various scripts would publish there
> directly.
> 
> In the most simple version of this, the cluster framework would just
> hard-code a subscription to "bro/cluster/". And then scripts like the
> Intel framework would just publish all their events to "bro/cluster/"
> directly through Broker.
> 
> To allow for distinguishing by node type we can define separate topic
> hierarchies: "bro/cluster/{manager,worker,logger}/". Each node
> subscribes to the hierarchy corresponding to its type, and each script
> publishes according to where it wants to send events to (again
> directly using the Broker API).
> 
> I think we could fit in Justin's hashing here too: We add per node
> topics as well ("bro/cluster/node/worker-1/",
> "bro/cluster/node/worker-2/", etc.) and then the cluster framework can
> provide a function that maps a hash key to a topic that corresponds to
> currently active node:
> 
>     local topic = Cluster:topic_for_key("abcdef"); # Returns, e.g., "bro/cluster/node/worker-1"
>     Broker::publish(topic, event);
> 
> And that scheme may suggest that instead of hard-coding topics on the
> sender side, the Cluster framework could generally provide a set of
> functions to retrieve the right topic:
> 
>     # In SumStats framework:
>     local topic = Cluster::topic_for_manager() # Returns "bro/cluster/manager".
>     Broker::public(topic, event);
> 
> Bottom-line: If we can find a way to steer information by setting up
> topics appropriately, we might not need much additional configuration
> at all.

Just to add my two cents here - I like this a whole lot better and agree
that mostly steering events through topics seems like a neat choice.

Johanna

From johanna at icir.org  Tue Oct 10 12:10:55 2017
From: johanna at icir.org (Johanna Amann)
Date: Tue, 10 Oct 2017 12:10:55 -0700
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
Message-ID: <20171010191055.uafzuanb2hntzjj4@Beezling.local>

> Script-Author Example Usage
> ---------------------------
> 
> # Script author that wants to utilize data stores doesn't have to be aware of
> # whether user is running a cluster or if they want to use persistent storage
> # backends.
> 
> const Software::tracked_store_name = "bro/framework/software/tracked" &redef;
> 
> global Software::tracked_store: opaque of Broker::Store;
> 
> event bro_init() &priority = +10
>   {
>   Software::tracked_store = Broker::InitStore(Software::tracked_store_name);
>   }

I hope that this was not already answered somewhere else and I just missed
it - after you set up a store with Broker::InitStore, how do you interact
with Software::tracked_store?

I am especially curious how this handles the strong typing of Bro.

Johanna

From robin at icir.org  Wed Oct 11 07:57:50 2017
From: robin at icir.org (Robin Sommer)
Date: Wed, 11 Oct 2017 07:57:50 -0700
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <9F565953-D1F7-492D-8E68-137021633C4C@illinois.edu>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<20171006225846.GC83573@icir.org>
	<9F565953-D1F7-492D-8E68-137021633C4C@illinois.edu>
Message-ID: <20171011145750.GC81035@icir.org>


On Mon, Oct 09, 2017 at 19:33 +0000, you wrote:

> It?s just a matter of where you expect most users to feel comfortable
> making customizations: in Bro scripts or in a broctl config file.

True, though I think that applies to much of Bro's configuration, like
the logging for example. Either way, starting with with script-only
customization and then reevaluate later sounds good.

> Maybe the key point is that these customizations only make sense to
> happen once before init time?

Yeah, that's right, changing store attributes afterwards seems
unlikely. From that perspective I get the redef approach. I was more
thinking about consistency with other script APIs. We use redef for
simple tuning (single-value options, timeouts, etc), but less these
days for more complex setups (see logging and input frameworks). I'd
be interested to hear what other people prefer.

Robin

-- 
Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin

From jsiwek at illinois.edu  Wed Oct 11 10:34:23 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Wed, 11 Oct 2017 17:34:23 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <20171010191055.uafzuanb2hntzjj4@Beezling.local>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<20171010191055.uafzuanb2hntzjj4@Beezling.local>
Message-ID: <A16E2527-8B75-4002-BC89-2C8955DA499A@illinois.edu>


> On Oct 10, 2017, at 2:10 PM, Johanna Amann <johanna at icir.org> wrote:
> 
> it - after you set up a store with Broker::InitStore, how do you interact
> with Software::tracked_store?

Probably best to look at this Broker script API:

https://github.com/bro/bro/blob/topic/actor-system/scripts/base/frameworks/broker/store.bro

e.g. you have get/push/etc. operations you can do on it, like this example:

https://github.com/bro/bro/blob/topic/actor-system/testing/btest/broker/store/ops.bro

> I am especially curious how this handles the strong typing of Bro.

All data in stores are an opaque Data type and the store operations (e.g. from API in link above) implicitly convert Bro types into that type.

Then when retrieving data from a store, to convert Data to Bro values, you can use new ?is? or ?as? operators or a new type-based-switch-statement.  Example:

https://github.com/bro/bro/blob/topic/actor-system/testing/btest/broker/store/type-conversion.bro

- Jon


From aaron.eppert at packetsled.com  Thu Oct 12 14:21:26 2017
From: aaron.eppert at packetsled.com (Aaron Eppert)
Date: Thu, 12 Oct 2017 14:21:26 -0700
Subject: [Bro-Dev] File Analysis Inconsistencies
Message-ID: <CAAfXY6HL2QPBAxLT2KfhdR-diF0EWV=dxEMK9g9eUuzuOSs8Rw@mail.gmail.com>

I crafted a custom file analysis plugin that attaches to specific MIME
types via file_sniff and fires an appropriate event once processing has
been completed.

I had to jump through a few hoops to make a file analysis plugin, first,
but those were cleared and everything runs and loads appropriately there
(bro -NN verified.) My test regime is very straight forward, I have several
PCAPs cooked up that contain simple HTTP file GETs (that extract otherwise
properly and do not exhibit missing_bytes) and I am running them via `bro
-C -r <>.pcap`. My issue comes with utter and complete inconsistency with
execution - it is, effectively, a coin flip, with zero changes.

When I have dumped the buffers being processed, as my file analysis plugin
has a secondary verification to make sure the data passed is appropriate -
which is confusing, as the mime type fires correct, which seems to indicate
a bug somewhere in the data path - the correct execution, clearly has the
proper data in it. The invalid executions, again changing nothing other
than a subsequent execution, shows a buffer of what appears to be
completely random data.

I currently cannot supply the file analysis plugin for inspection, but
would very much appreciate insight in how to find the root cause. It very
much seems to be upstream. If I run the analysis portion of the plugin as a
free standing executable outside of Bro against the data transferred via
HTTP, everything works perfect and the structures are filled accordingly.

I saw BIT-1832, and there could be similar root causes in there, but I have
not had time to investigate otherwise. The issues I am raising, again, are
command line replay via command line, not even ?live? network traffic or
tcpreplay over a NIC/dummy interface.


Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171012/3153479c/attachment.html 

From jazoff at illinois.edu  Thu Oct 12 14:35:06 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Thu, 12 Oct 2017 21:35:06 +0000
Subject: [Bro-Dev] File Analysis Inconsistencies
In-Reply-To: <CAAfXY6HL2QPBAxLT2KfhdR-diF0EWV=dxEMK9g9eUuzuOSs8Rw@mail.gmail.com>
References: <CAAfXY6HL2QPBAxLT2KfhdR-diF0EWV=dxEMK9g9eUuzuOSs8Rw@mail.gmail.com>
Message-ID: <42A432BE-2325-4305-885F-1B3197BD7EBF@illinois.edu>


> On Oct 12, 2017, at 5:21 PM, Aaron Eppert <aaron.eppert at packetsled.com> wrote:
> 
> I crafted a custom file analysis plugin that attaches to specific MIME types via file_sniff and fires an appropriate event once processing has been completed.
> 
> I had to jump through a few hoops to make a file analysis plugin, first, but those were cleared and everything runs and loads appropriately there (bro -NN verified.) My test regime is very straight forward, I have several PCAPs cooked up that contain simple HTTP file GETs (that extract otherwise properly and do not exhibit missing_bytes) and I am running them via `bro -C -r <>.pcap`. My issue comes with utter and complete inconsistency with execution - it is, effectively, a coin flip, with zero changes. 
> 
> When I have dumped the buffers being processed, as my file analysis plugin has a secondary verification to make sure the data passed is appropriate - which is confusing, as the mime type fires correct, which seems to indicate a bug somewhere in the data path - the correct execution, clearly has the proper data in it. The invalid executions, again changing nothing other than a subsequent execution, shows a buffer of what appears to be completely random data.
> 
> 

That sounds a lot like an uninitialized buffer somewhere.  I wonder if you compile bro and your plugin with -fsanitize=address if you will trigger something with that.

> I currently cannot supply the file analysis plugin for inspection, but would very much appreciate insight in how to find the root cause. It very much seems to be upstream. If I run the analysis portion of the plugin as a free standing executable outside of Bro against the data transferred via HTTP, everything works perfect and the structures are filled accordingly.


If you are seeing what looks like random data in your plugin you should be able to reproduce this behavior by having a file analysis plugin that just dumps out the buffers to stdout (as hex?).  Can you rip out all the custom logic in your plugin leaving something that just dumps the buffers as-is?  That should leave you with just the hello world of file analysis plugins.  If that shows the problem we should be able to figure out where it is coming from.

I don't think file analysis is inherently broken somewhere, otherwise the bro test suite would fail.  I think this would have to point to something unique about your plugin.  I think you are the first person to build an out of tree file analysis plugin, so there may be an issue with the bro<->plugin interface for file analsys itself.  If that is the case, extracting something like the built in md5 analysis plugin to an external plugin and calling it 'mymd5' would show the same problems.

> I saw BIT-1832, and there could be similar root causes in there, but I have not had time to investigate otherwise. The issues I am raising, again, are command line replay via command line, not even ?live? network traffic or tcpreplay over a NIC/dummy interface.

That does sound similar, but I'm not sure if they were seeing different results on the same pcap on different runs.

? 
Justin Azoff


From jazoff at illinois.edu  Thu Oct 12 15:08:04 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Thu, 12 Oct 2017 22:08:04 +0000
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
Message-ID: <8E412ABA-4E5E-4B98-BF56-7B532B67556D@illinois.edu>


> On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> 
> I particularly like the idea of an allocation pool that per-packet information can be stored, and reused by the next packet.
> 
> There also are probably some optimizations of frequent operations now that we're in a 64-bit world that could prove useful - the one's complement checksum calculation in net_util.cc is one that comes to mind, especially since it works effectively a byte at a time (and works with even byte counts only).  Seeing as this is done per-packet on all tcp payload, optimizing this seems reasonable.  Here's a discussion of do the checksum calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/ -

So I still haven't gotten this to work, but I did some more tests that I think show it is worthwhile to look into replacing this function.

I generated a large pcap of a 3 minute iperf run:

    $ du -hs iperf.pcap
    9.6G	iperf.pcap
    $ tcpdump  -n -r iperf.pcap |wc -l
    reading from file iperf.pcap, link-type EN10MB (Ethernet)
    7497698

Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as well as cpu instructions reported by `perf`:

    $ python2 bench.py 5 bro -Cbr iperf.pcap
    15.19 49947664388
    15.66 49947827678
    15.74 49947853306
    15.66 49949603644
    15.42 49951191958
    elapsed
    Min 15.18678689
    Max 15.7425909042
    Avg 15.5343231678
    
    instructions
    Min 49947664388
    Max 49951191958
    Avg 49948828194
    
    $ python2 bench.py 5 bro -br iperf.pcap
    20.82 95502327077
    21.31 95489729078
    20.52 95483242217
    21.45 95499193001
    21.32 95498830971
    elapsed
    Min 20.5184400082
    Max 21.4452238083
    Avg 21.083449173
    
    instructions
    Min 95483242217
    Max 95502327077
    Avg 95494664468


So this shows that for every ~7,500,000 packets bro processes, almost 5 seconds is spent computing checksums.

According to https://locklessinc.com/articles/tcp_checksum/, they run their benchmark 2^24 times (16,777,216) which is about 2.2 times as many packets.

Their runtime starts out at about 11s, which puts it in line with the current implementation in bro.  The other implementations they show are
between 7 and 10x faster depending on packet size.  A 90% drop in time spent computing checksums would be a noticeable improvement.


Unfortunately I couldn't get their implementation to work inside of bro and get the right result, and even if I could, it's not clear what the license for the code is.


? 
Justin Azoff


From aaron.eppert at packetsled.com  Fri Oct 13 08:01:36 2017
From: aaron.eppert at packetsled.com (Aaron Eppert)
Date: Fri, 13 Oct 2017 08:01:36 -0700
Subject: [Bro-Dev] File Analysis Inconsistencies
In-Reply-To: <42A432BE-2325-4305-885F-1B3197BD7EBF@illinois.edu>
References: <CAAfXY6HL2QPBAxLT2KfhdR-diF0EWV=dxEMK9g9eUuzuOSs8Rw@mail.gmail.com>
	<42A432BE-2325-4305-885F-1B3197BD7EBF@illinois.edu>
Message-ID: <CAAfXY6H8Uana_TqyAP44pdjoS87P3vQi755jXx11fRzQQGKXzw@mail.gmail.com>

Justin,

Indeed, cutting new territory is always interesting. As for the code,

https://github.com/aeppert/test_file_analyzer


File I am using for this case:
https://www.bro.org/static/exchange-2013/faf-exercise.pcap

`bro -C -r faf-exercise.pcap` after building and installing the plugin.

My suspicion is it?s either unbelievably trivial and I keep missing it
because I am the only one staring at it, or it?s a rather deep rabbit hole.

Aaron

On October 12, 2017 at 5:35:15 PM, Azoff, Justin S (jazoff at illinois.edu)
wrote:


> On Oct 12, 2017, at 5:21 PM, Aaron Eppert <aaron.eppert at packetsled.com>
wrote:
>
> I crafted a custom file analysis plugin that attaches to specific MIME
types via file_sniff and fires an appropriate event once processing has
been completed.
>
> I had to jump through a few hoops to make a file analysis plugin, first,
but those were cleared and everything runs and loads appropriately there
(bro -NN verified.) My test regime is very straight forward, I have several
PCAPs cooked up that contain simple HTTP file GETs (that extract otherwise
properly and do not exhibit missing_bytes) and I am running them via `bro
-C -r <>.pcap`. My issue comes with utter and complete inconsistency with
execution - it is, effectively, a coin flip, with zero changes.
>
> When I have dumped the buffers being processed, as my file analysis
plugin has a secondary verification to make sure the data passed is
appropriate - which is confusing, as the mime type fires correct, which
seems to indicate a bug somewhere in the data path - the correct execution,
clearly has the proper data in it. The invalid executions, again changing
nothing other than a subsequent execution, shows a buffer of what appears
to be completely random data.
>
>

That sounds a lot like an uninitialized buffer somewhere. I wonder if you
compile bro and your plugin with -fsanitize=address if you will trigger
something with that.

> I currently cannot supply the file analysis plugin for inspection, but
would very much appreciate insight in how to find the root cause. It very
much seems to be upstream. If I run the analysis portion of the plugin as a
free standing executable outside of Bro against the data transferred via
HTTP, everything works perfect and the structures are filled accordingly.


If you are seeing what looks like random data in your plugin you should be
able to reproduce this behavior by having a file analysis plugin that just
dumps out the buffers to stdout (as hex?). Can you rip out all the custom
logic in your plugin leaving something that just dumps the buffers as-is?
That should leave you with just the hello world of file analysis plugins.
If that shows the problem we should be able to figure out where it is
coming from.

I don't think file analysis is inherently broken somewhere, otherwise the
bro test suite would fail. I think this would have to point to something
unique about your plugin. I think you are the first person to build an out
of tree file analysis plugin, so there may be an issue with the
bro<->plugin interface for file analsys itself. If that is the case,
extracting something like the built in md5 analysis plugin to an external
plugin and calling it 'mymd5' would show the same problems.

> I saw BIT-1832, and there could be similar root causes in there, but I
have not had time to investigate otherwise. The issues I am raising, again,
are command line replay via command line, not even ?live? network traffic
or tcpreplay over a NIC/dummy interface.

That does sound similar, but I'm not sure if they were seeing different
results on the same pcap on different runs.

?
Justin Azoff
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171013/a7b382cf/attachment-0001.html 

From jsiwek at illinois.edu  Fri Oct 13 08:10:55 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Fri, 13 Oct 2017 15:10:55 +0000
Subject: [Bro-Dev] File Analysis Inconsistencies
In-Reply-To: <42A432BE-2325-4305-885F-1B3197BD7EBF@illinois.edu>
References: <CAAfXY6HL2QPBAxLT2KfhdR-diF0EWV=dxEMK9g9eUuzuOSs8Rw@mail.gmail.com>
	<42A432BE-2325-4305-885F-1B3197BD7EBF@illinois.edu>
Message-ID: <F8B763BA-1003-4C16-9A65-5E7D69DA494A@illinois.edu>


> On Oct 12, 2017, at 4:35 PM, Azoff, Justin S <jazoff at illinois.edu> wrote:
> 
> That sounds a lot like an uninitialized buffer somewhere.  I wonder if you compile bro and your plugin with -fsanitize=address if you will trigger something with that.

Yeah, sounds worth checking for memory errors with a profiler/analyzer that can do that.  I think there?s also the ?memory? sanitizer to detect uninitialized reads?  Valgrind?s memcheck tool has also helped me a lot with such things, IIRC something like `valgrind --leak-check=full --track-origins=yes ...`

- Jon


From jazoff at illinois.edu  Fri Oct 13 08:32:42 2017
From: jazoff at illinois.edu (Azoff, Justin S)
Date: Fri, 13 Oct 2017 15:32:42 +0000
Subject: [Bro-Dev] File Analysis Inconsistencies
In-Reply-To: <CAAfXY6H8Uana_TqyAP44pdjoS87P3vQi755jXx11fRzQQGKXzw@mail.gmail.com>
References: <CAAfXY6HL2QPBAxLT2KfhdR-diF0EWV=dxEMK9g9eUuzuOSs8Rw@mail.gmail.com>
	<42A432BE-2325-4305-885F-1B3197BD7EBF@illinois.edu>
	<CAAfXY6H8Uana_TqyAP44pdjoS87P3vQi755jXx11fRzQQGKXzw@mail.gmail.com>
Message-ID: <7A9DB74F-11EC-41FD-8952-24F4843191E6@illinois.edu>


> On Oct 13, 2017, at 11:01 AM, Aaron Eppert <aaron.eppert at packetsled.com> wrote:
> 
> Justin,
> 
> Indeed, cutting new territory is always interesting. As for the code,
> 
> https://github.com/aeppert/test_file_analyzer
> 
> 
> File I am using for this case:
> https://www.bro.org/static/exchange-2013/faf-exercise.pcap
> 
> `bro -C -r faf-exercise.pcap` after building and installing the plugin.
> 
> My suspicion is it?s either unbelievably trivial and I keep missing it because I am the only one staring at it, or it?s a rather deep rabbit hole. 
> 
> Aaron

Thanks for putting that together.. now I see what you mean. Building the plugin with ASAN confirms it is trying to access uninitialized memory:


 $ /usr/local/bro/bin/bro -C -r faf-exercise.pcap
TEST::Finalize total_len = 65960
BUFFER
00 ea 09 00 50 61 00 00 80 eb 09 00 50 61 00 00
=================================================================
==93650==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000bf5d08 at pc 0x00010b19d39b bp 0x7fff57829e10 sp 0x7fff57829e08
READ of size 1 at 0x603000bf5d08 thread T0
    #0 0x10b19d39a in print_bytes(std::__1::basic_ostream<char, std::__1::char_traits<char> >&, char const*, unsigned char const*, unsigned long, bool) TEST.cc:21
    #1 0x10b19e43b in file_analysis::TEST::Finalize() TEST.cc:87

...


The problem is this line:

                bufv->push_back(data);

That's only pushing the first char of the buffer onto the vector, not the entire buffer.

If you print out bufv->size() you'll see that it is not what it should be.

If you apply this change it will run without crashing and I believe give the expected output:

diff --git a/src/TEST.cc b/src/TEST.cc
index 8d78ef2..56d0a83 100644
--- a/src/TEST.cc
+++ b/src/TEST.cc
@@ -56,7 +56,7 @@ bool TEST::DeliverStream(const u_char* data, uint64 len)
        }

        if ( total_len < TEST_MAX_BUFFER) {
-               bufv->push_back(data);
+               print_bytes(std::cout, "BUFFER", data, len);
                total_len += len;
        }

@@ -84,7 +84,7 @@ void TEST::Finalize()

        //auto pos = std::find(bufv->begin(), bufv->end(), (unsigned char *)"Exif");
        //std::cout << "Offset = " << std::distance( bufv->begin(), pos ) << std::endl;
-       print_bytes(std::cout, "BUFFER", (const u_char *)&bufv[0], total_len);
+       //print_bytes(std::cout, "BUFFER", (const u_char *)&bufv[0], total_len);

        val_list* vl = new val_list();
        vl->append(GetFile()->GetVal()->Ref());

I don't know off the top of my head the right way to extend a c++ vector by a c buffer, but doing so should fix things.

? 
Justin Azoff


From jmellander at lbl.gov  Sat Oct 14 12:41:55 2017
From: jmellander at lbl.gov (Jim Mellander)
Date: Sat, 14 Oct 2017 12:41:55 -0700
Subject: [Bro-Dev] Performance Enhancements
In-Reply-To: <8E412ABA-4E5E-4B98-BF56-7B532B67556D@illinois.edu>
References: <CADju=b6wJjsiNb_01n4wOra9RJF1JC32V96wbEEfB7KEO-S4tw@mail.gmail.com>
	<CY4PR01MB23607D0C5A9EE68979558854AA710@CY4PR01MB2360.prod.exchangelabs.com>
	<02E2EDC3-7667-4D67-8D4D-C92B0D83E8BA@illinois.edu>
	<CADju=b5HxrjmQQxGGaCFjH6m0O0yCbvxVp0Hgx0U_TarQEdNZA@mail.gmail.com>
	<8E412ABA-4E5E-4B98-BF56-7B532B67556D@illinois.edu>
Message-ID: <CADju=b7Uh9Jta=3jsAEn_i8VgLCWknOD11WnRaUU3KSjwUthJw@mail.gmail.com>

Yeh, the lockless implementation has a bug:

if (size)

s/b

if (size & 1)

I ended up writing an checksum routine that sums 32 bits at a time into a
64 bit register, which avoids the need to check for overflow - it seems to
be faster than the full 64 bit implementation - will test with Bro and
report results.

On Thu, Oct 12, 2017 at 3:08 PM, Azoff, Justin S <jazoff at illinois.edu>
wrote:

>
> > On Oct 6, 2017, at 5:59 PM, Jim Mellander <jmellander at lbl.gov> wrote:
> >
> > I particularly like the idea of an allocation pool that per-packet
> information can be stored, and reused by the next packet.
> >
> > There also are probably some optimizations of frequent operations now
> that we're in a 64-bit world that could prove useful - the one's complement
> checksum calculation in net_util.cc is one that comes to mind, especially
> since it works effectively a byte at a time (and works with even byte
> counts only).  Seeing as this is done per-packet on all tcp payload,
> optimizing this seems reasonable.  Here's a discussion of do the checksum
> calc in 64-bit arithmetic: https://locklessinc.com/articles/tcp_checksum/
> -
>
> So I still haven't gotten this to work, but I did some more tests that I
> think show it is worthwhile to look into replacing this function.
>
> I generated a large pcap of a 3 minute iperf run:
>
>     $ du -hs iperf.pcap
>     9.6G        iperf.pcap
>     $ tcpdump  -n -r iperf.pcap |wc -l
>     reading from file iperf.pcap, link-type EN10MB (Ethernet)
>     7497698
>
> Then ran either `bro -Cbr` or `bro -br` on it 5 times and track runtime as
> well as cpu instructions reported by `perf`:
>
>     $ python2 bench.py 5 bro -Cbr iperf.pcap
>     15.19 49947664388
>     15.66 49947827678
>     15.74 49947853306
>     15.66 49949603644
>     15.42 49951191958
>     elapsed
>     Min 15.18678689
>     Max 15.7425909042
>     Avg 15.5343231678
>
>     instructions
>     Min 49947664388
>     Max 49951191958
>     Avg 49948828194
>
>     $ python2 bench.py 5 bro -br iperf.pcap
>     20.82 95502327077
>     21.31 95489729078
>     20.52 95483242217
>     21.45 95499193001
>     21.32 95498830971
>     elapsed
>     Min 20.5184400082
>     Max 21.4452238083
>     Avg 21.083449173
>
>     instructions
>     Min 95483242217
>     Max 95502327077
>     Avg 95494664468
>
>
> So this shows that for every ~7,500,000 packets bro processes, almost 5
> seconds is spent computing checksums.
>
> According to https://locklessinc.com/articles/tcp_checksum/, they run
> their benchmark 2^24 times (16,777,216) which is about 2.2 times as many
> packets.
>
> Their runtime starts out at about 11s, which puts it in line with the
> current implementation in bro.  The other implementations they show are
> between 7 and 10x faster depending on packet size.  A 90% drop in time
> spent computing checksums would be a noticeable improvement.
>
>
> Unfortunately I couldn't get their implementation to work inside of bro
> and get the right result, and even if I could, it's not clear what the
> license for the code is.
>
>
>
>
>
> ?
> Justin Azoff
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20171014/fe1a8985/attachment.html 

From jsiwek at illinois.edu  Fri Oct 20 11:26:17 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Fri, 20 Oct 2017 18:26:17 +0000
Subject: [Bro-Dev] design summary: porting Bro scripts to use Broker
In-Reply-To: <20171006225846.GC83573@icir.org>
References: <285A2806-E64A-43BD-9D12-E7B2661A7AA5@illinois.edu>
	<20171006225846.GC83573@icir.org>
Message-ID: <D22D60C3-C128-42A6-A3E1-7B264C01F59C@illinois.edu>


> On Oct 6, 2017, at 5:58 PM, Robin Sommer <robin at icir.org> wrote:
> 
> In the most simple version of this, the cluster framework would just
> hard-code a subscription to "bro/cluster/". And then scripts like the
> Intel framework would just publish all their events to "bro/cluster/"
> directly through Broker.

I just noticed that Bro configures Broker to use its new automatic multihop message forwarding which interacts poorly with a generic ?bro/cluster? topic that every node subscribes to.

When configuring a simple cluster of 1 manager, 1 worker, and 1 proxy using the traditional cluster layout (worker connects to both, and proxy connects to manager), I wanted nodes to keep track of which peers are still alive.  To do this I have a simple ?hello? event that is sent on seeing a new connection containing the needed information (a broker node id mapping to cluster node name).  Sending that event over the ?bro/cluster? topic causes it to be routed around until the TTL kills it. 

In this particular case, maybe not so bad since it?s expected to happen infrequently, but doesn?t seem like something that?s desirable or intuitive in a general sense.  It?s trivial to just disable automatic message forwarding via a global flag, though before going that way, I want to check if I?m missing other context/use-cases.  For the current script-porting work, are there plans/expectations to use automatic message forwarding or to change the traditional cluster topology so it doesn?t contain cycles?

- Jon


From aaron.eppert at packetsled.com  Wed Oct 25 16:16:52 2017
From: aaron.eppert at packetsled.com (Aaron Eppert)
Date: Wed, 25 Oct 2017 16:16:52 -0700
Subject: [Bro-Dev] BinPac - Many repeated messages in the same packet
Message-ID: <CAAfXY6Gvpq50rFLddgzNGKg6+2EqB9VUKFM_Y7xQUT2MiVFqvg@mail.gmail.com>

I am running into an implementation issue with BinPac and would hope
to find a few pointers.

I have a protocol that loads a given TCP packet with as many publish
messages as possible in a worst case scenario - often it just has a
single message, but it is not guaranteed. When a?publish message
contains more than one subsequent message, there is not an indicator
that another message follows.

The packet looks, generally like this:

+-------------------------------------+
| ? ? ? ? ? ? ?Message 0 ? ? ? ? ? ? ?|
+-------------------------------------+
| ? ? ? ? ? ? ?Message 1 ? ? ? ? ? ? ?|
+-------------------------------------+
| ? ? ? ? ? ? ?Message 2 ? ? ? ? ? ? ?|
+-------------------------------------+
| ? ? ? ? ? ? ? ? ... ? ? ? ? ? ? ? ? |
+-------------------------------------+
| ? ? ? ? ? ? Message N-2 ? ? ? ? ? ? |
+-------------------------------------+
| ? ? ? ? ? ? Message N-1 ? ? ? ? ? ? |
+-------------------------------------+
| ? ? ? ? ? ? ?Message N ? ? ? ? ? ? ?|
+-------------------------------------+

The protocol definition code I have written as follows:

type SPROTO_messages = record {
? ? thdr	? ? : uint8;
? ? hdrlen ? ? ? ? ?: uint8;
? ? variable_header : case msg_type of {
? ? ? ? SPROTO_CONNECT ? ? -> connect_packet ? ? ?: SPROTO_connect(hdrlen);
? ? ? ? SPROTO_SUBSCRIBE ? -> subscribe_packet ? ?: SPROTO_subscribe(hdrlen);
? ? ? ? SPROTO_SUBACK ? ? ?-> suback_packet ? ? ? : SPROTO_suback(hdrlen);
? ? ? ? SPROTO_PUBLISH ? ? -> publish_packet ? ? ?: SPROTO_publish(hdrlen);
? ? ? ? SPROTO_UNSUBSCRIBE -> unsubscribe_packet ?: SPROTO_unsubscribe(hdrlen);
? ? ? ? default ? ? ? ? ? ?-> none ? ? ? ? ? ? ? ?: empty;
? ? };
} &let {
? ? msg_type ? ? ? ?: uint8 = (thdr ?>> ?4);
};

type SPROTO_PDU(is_orig: bool) = record {
? ? sproto_messages ? :?SPROTO_messages[];
} &byteorder=bigendian;

?

I can tell via Wireshark that I am definitely missing messages. Any
advice on a better way to implement the above would be greatly
appreciated.

Aaron


From mfernandez at mitre.org  Thu Oct 26 03:55:00 2017
From: mfernandez at mitre.org (Fernandez, Mark I)
Date: Thu, 26 Oct 2017 10:55:00 +0000
Subject: [Bro-Dev] BinPac - Many repeated messages in the same packet
In-Reply-To: <CAAfXY6Gvpq50rFLddgzNGKg6+2EqB9VUKFM_Y7xQUT2MiVFqvg@mail.gmail.com>
References: <CAAfXY6Gvpq50rFLddgzNGKg6+2EqB9VUKFM_Y7xQUT2MiVFqvg@mail.gmail.com>
Message-ID: <MWHPR09MB1533AF29649D8555154FF411CF450@MWHPR09MB1533.namprd09.prod.outlook.com>

Aaron,

>> I have a protocol that loads a given TCP packet with as many publish
>> messages as possible in a worst case scenario - often it just has a
>> single message, but it is not guaranteed. When a publish message
>> contains more than one subsequent message, there is not an indicator
>> that another message follows.

Perhaps try something like this:

type SPROTO_messages = SPROTO_message[]
&until($input.length() == 0); # or some appropriate terminating condition

Type SPROTO_message = record {
    thdr	    : uint8;
    hdrlen          : uint8;
    variable_header : case msg_type of {
        SPROTO_CONNECT     -> connect_packet      : SPROTO_connect(hdrlen);
        SPROTO_SUBSCRIBE   -> subscribe_packet    : SPROTO_subscribe(hdrlen);
        SPROTO_SUBACK      -> suback_packet       : SPROTO_suback(hdrlen);
        SPROTO_PUBLISH     -> publish_packet      : SPROTO_publish(hdrlen);
        SPROTO_UNSUBSCRIBE -> unsubscribe_packet  : SPROTO_unsubscribe(hdrlen);
        default            -> none                : empty;
    };
} &let {
    msg_type        : uint8 = (thdr  >>  4);
};


Mark


From robin at icir.org  Tue Oct 31 11:16:07 2017
From: robin at icir.org (Robin Sommer)
Date: Tue, 31 Oct 2017 11:16:07 -0700
Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system:
 First-pass broker-enabled Cluster scripting API + misc. (07ad06b)
In-Reply-To: <201710271803.v9RI3oSQ001411@bro-ids.icir.org>
References: <201710271803.v9RI3oSQ001411@bro-ids.icir.org>
Message-ID: <20171031181607.GB26741@icir.org>

This is coming together quite nicely. Not sure if it's stable yet, but
I'll just go ahead with some feedback I noticed looking over the new
cluster API:

    - One thing I can't quite tell is if this is still aiming to
      maintain compatibility with the old communication system, like
      by keeping the proxies and also the *_events patterns. Looking
      at setup-connections, it seems so. I'd say just go ahead and
      remove all legacy pieces. Maintain two schemes in parallel is
      cumbersome, and I think it's fine to just force everything over
      to Broker.

    - Is the idea for the "*_topic" constants that one just picks the
      apppropiate one when sending events? Like if I want to publish
      something to all workers, I'd publish to Cluster::worker_topic?
      I think that's fine, though I'm wondering if we could compress
      the API there somehow so that Cluster doesn't need to export all
      those constants indvidiually. One idea would be a function that
      returns a topic based on node type?

    - I like the Pools! How about moving Pool with its functions out
      of the main.bro, just for clarity.

    - Looks like the hello/bye events are broadcasted by all nodes. Is
      that on purpose, or should that be limited to just one, like
      just the master sending them out? Or does it not matter and this
      provides for more redundancy?

    - create_store() vs "stores": Is the idea that I'd normally use
      create_store() and that populates the table, but I could also
      redef it myself instead of using create_store() to create more
      custom entries? If so, maybe make that a bit more explicit in
      the comments that there're two ways to configure that table.

Robin


On Fri, Oct 27, 2017 at 12:44 -0500, Jonathan Siwek wrote:

> Repository : ssh://git at bro-ids.icir.org/bro
> On branch  : topic/actor-system
> Link       : https://github.com/bro/bro/commit/07ad06b083d16f9cf1c86041cf7287335a74ebbb
> 
> >---------------------------------------------------------------
> 
> commit 07ad06b083d16f9cf1c86041cf7287335a74ebbb
> Author: Jon Siwek <jsiwek at illinois.edu>
> Date:   Fri Oct 27 12:44:54 2017 -0500
> 
>     First-pass broker-enabled Cluster scripting API + misc.
>     
>     - Remove Broker::Options, Broker::configure().  This was only
>       half implemented (e.g. the "routable" flag wasn't used), and using
>       a function to set these options was awkward: the only way to
>       override defaults was via calling configure() in a bro_init with
>       *lower* priority such that the last call "wins".  Also doesn't
>       really make sense for it to be a function since the underlying
>       broker library can't adapt to changes in these configuration
>       values dynamically at runtime, so instead there's just now
>       two options you can redef: "Broker::forward_messages" and
>       "Broker::log_topic".
>     
>     - Add Broker::node_id() to get a unique identifier for the Bro instance's
>       broker endpoint.  This is used by the Cluster API to map node name
>       (e.g. "manager") to broker endpoint so that one can track which nodes
>       are still alive.
>     
>     - Fix how broker-based communication interacts with --pseudo-realtime
>       and reading pcaps: bro now terminates at end of reading pcap when
>       broker is active (this should now be equivalent to how RemoteSerializer
>       worked).
>     
>     - New broker-enabled Cluster framework API
>       - Still uses Cluster::nodes as the means of setting up cluster network
>       - See Cluster::stores, Cluster::StoreInfo, and Cluster::create_store
>         for how broker data stores are integrated into cluster operation
>     
>     - Update several unit tests to new Cluster API.  Failing tests at
>       the moment are mostly from scripts/frameworks that haven't been
>       ported to the new Cluster API.
> 
> 
> >---------------------------------------------------------------
> 
> 07ad06b083d16f9cf1c86041cf7287335a74ebbb
>  aux/broker                                         |   2 +-
>  scripts/base/frameworks/broker/main.bro            |  47 ++--
>  scripts/base/frameworks/broker/store.bro           |  14 +-
>  scripts/base/frameworks/cluster/__load__.bro       |   7 -
>  scripts/base/frameworks/cluster/main.bro           | 279 ++++++++++++++++++++-
>  .../base/frameworks/cluster/setup-connections.bro  | 150 +++++++++++
>  scripts/base/frameworks/control/main.bro           |   1 +
>  src/broker/Manager.cc                              |  20 +-
>  src/broker/Manager.h                               |  15 +-
>  src/broker/comm.bif                                |  21 +-
>  src/iosource/PktSrc.cc                             |  20 +-
>  testing/btest/Baseline/plugins.hooks/output        |  18 +-
>  .../manager-1..stdout                              |   4 +
>  .../manager-1.reporter.log                         |  11 +-
>  testing/btest/broker/remote_log_types.bro          |   2 +-
>  .../base/frameworks/cluster/start-it-up-logger.bro |  47 +++-
>  .../base/frameworks/cluster/start-it-up.bro        |  48 +++-
>  .../frameworks/control/configuration_update.bro    |   2 +-
>  .../scripts/base/frameworks/control/id_value.bro   |   2 +-
>  .../scripts/base/frameworks/control/shutdown.bro   |   2 +-
>  .../logging/field-extension-cluster-error.bro      |   7 +-
>  .../frameworks/logging/field-extension-cluster.bro |   7 +-
>  22 files changed, 601 insertions(+), 125 deletions(-)
> 
> diff --git a/aux/broker b/aux/broker
> index 76375d0..e1d637c 160000
> --- a/aux/broker
> +++ b/aux/broker
> @@ -1 +1 @@
> -Subproject commit 76375d07f5bf1ffc9711e064644bf865eda7a828
> +Subproject commit e1d637c816955a451079b419f438307960109346
> diff --git a/scripts/base/frameworks/broker/main.bro b/scripts/base/frameworks/broker/main.bro
> index 47533a4..3ac25ed 100644
> --- a/scripts/base/frameworks/broker/main.bro
> +++ b/scripts/base/frameworks/broker/main.bro
> @@ -20,6 +20,11 @@ export {
>  	## use already.
>  	const default_listen_retry = 30sec &redef;
>  
> +	## Default address on which to listen.
> +	##
> +	## .. bro:see:: Broker::listen
> +	const default_listen_address = "" &redef;
> +
>  	## Default interval to retry connecting to a peer if it cannot be made to work
>  	## initially, or if it ever becomes disconnected.
>  	const default_connect_retry = 30sec &redef;
> @@ -55,14 +60,12 @@ export {
>  	## all peers.
>  	const ssl_keyfile = "" &redef;
>  
> -	## The available configuration options when enabling Broker.
> -	type Options: record {
> -		## Whether this Broker instance relays messages not destined to itself.
> -		## By default, routing is disabled.
> -		routable: bool &default = F;
> -		## The topic prefix where to publish logs.
> -		log_topic: string &default = "bro/logs/";
> -	};
> +	## Forward all received messages to subscribing peers.
> +	const forward_messages = F &redef;
> +
> +	## The topic prefix where logs will be published.  The log's stream id
> +	## is appended when writing to a particular stream.
> +	const log_topic = "bro/logs/" &redef;
>  
>  	type ErrorCode: enum {
>  		## The unspecified default error code.
> @@ -153,13 +156,6 @@ export {
>  		val: Broker::Data;
>  	};
>  
> -	## Configures the local endpoint.
> -	##
> -	## options: Configures the local Broker endpoint behavior.
> -	##
> -	## Returns: true if configuration was successfully performed..
> -	global configure: function(options: Options &default = Options()): bool;
> -
>  	## Listen for remote connections.
>  	##
>  	## a: an address string on which to accept connections, e.g.
> @@ -174,7 +170,8 @@ export {
>  	## Returns: the bound port or 0/? on failure.
>  	##
>  	## .. bro:see:: Broker::status
> -	global listen: function(a: string &default = "", p: port &default=default_port,
> +	global listen: function(a: string &default = default_listen_address,
> +	                        p: port &default = default_port,
>  	                        retry: interval &default = default_listen_retry): port;
>  	## Initiate a remote connection.
>  	##
> @@ -213,6 +210,9 @@ export {
>  	## Returns: a list of all peer connections.
>  	global peers: function(): vector of PeerInfo;
>  
> +	## Returns: a unique identifier for the local broker endpoint.
> +	global node_id: function(): string;
> +
>  	## Publishes an event at a given topic.
>  	##
>  	## topic: a topic associated with the event message.
> @@ -278,16 +278,6 @@ export {
>  
>  module Broker;
>  
> -event bro_init() &priority=-10
> -	{
> -	configure(); # Configure with defaults.
> -	}
> -
> -function configure(options: Options &default = Options()): bool
> -	{
> -	return __configure(options);
> -	}
> -
>  event retry_listen(a: string, p: port, retry: interval)
>  	{
>  	listen(a, p, retry);
> @@ -318,6 +308,11 @@ function peers(): vector of PeerInfo
>  	return __peers();
>  	}
>  
> +function node_id(): string
> +	{
> +	return __node_id();
> +	}
> +
>  function publish(topic: string, ev: Event): bool
>  	{
>  	return __publish(topic, ev);
> diff --git a/scripts/base/frameworks/broker/store.bro b/scripts/base/frameworks/broker/store.bro
> index ed735e0..6b22f5d 100644
> --- a/scripts/base/frameworks/broker/store.bro
> +++ b/scripts/base/frameworks/broker/store.bro
> @@ -61,7 +61,7 @@ export {
>  	                               options: BackendOptions &default = BackendOptions()): opaque of Broker::Store;
>  
>  	## Create a clone of a master data store which may live with a remote peer.
> -	## A clone automatically synchronizes to the master by automatically
> +	## A clone automatically synchronizes to the master by
>  	## receiving modifications and applying them locally.  Direct modifications
>  	## are not possible, they must be sent through the master store, which then
>  	## automatically broadcasts the changes out to clones.  But queries may be
> @@ -70,18 +70,6 @@ export {
>  	##
>  	## name: the unique name which identifies the master data store.
>  	##
> -	## b: the storage backend to use.
> -	##
> -	## options: tunes how some storage backends operate.
> -	##
> -	## resync: the interval at which to re-attempt synchronizing with the master
> -	##         store should the connection be lost.  If the clone has not yet
> -	##         synchronized for the first time, updates and queries queue up
> -	##         until the synchronization completes.  After, if the connection
> -	##         to the master store is lost, queries continue to use the clone's
> -	##         version, but updates will be lost until the master is once again
> -	##         available.
> -	##
>  	## Returns: a handle to the data store.
>  	global create_clone: function(name: string): opaque of Broker::Store;
>  
> diff --git a/scripts/base/frameworks/cluster/__load__.bro b/scripts/base/frameworks/cluster/__load__.bro
> index 1717b83..4f193c0 100644
> --- a/scripts/base/frameworks/cluster/__load__.bro
> +++ b/scripts/base/frameworks/cluster/__load__.bro
> @@ -19,13 +19,6 @@ redef peer_description = Cluster::node;
>  
>  @load ./setup-connections
>  
> -# Don't load the listening script until we're a bit more sure that the
> -# cluster framework is actually being enabled.
> - at load frameworks/communication/listen
> -
> -## Set the port that this node is supposed to listen on.
> -redef Communication::listen_port = Cluster::nodes[Cluster::node]$p;
> -
>  @if ( Cluster::local_node_type() == Cluster::MANAGER )
>  @load ./nodes/manager
>  # If no logger is defined, then the manager receives logs.
> diff --git a/scripts/base/frameworks/cluster/main.bro b/scripts/base/frameworks/cluster/main.bro
> index 261f3f1..c94ed28 100644
> --- a/scripts/base/frameworks/cluster/main.bro
> +++ b/scripts/base/frameworks/cluster/main.bro
> @@ -7,10 +7,96 @@
>  ##! ``@load base/frameworks/cluster``.
>  
>  @load base/frameworks/control
> + at load base/frameworks/broker
>  
>  module Cluster;
>  
>  export {
> +	## Whether the cluster framework uses broker to perform remote communication.
> +	const use_broker = T &redef;
> +
> +	## The topic name used for exchanging general messages that are relevant to
> +	## any node in a cluster.  Used with broker-enabled cluster communication.
> +	const broadcast_topic = "bro/cluster/broadcast" &redef;
> +
> +	## The topic name used for exchanging messages that are relevant to
> +	## logger nodes in a cluster.  Used with broker-enabled cluster communication.
> +	const logger_topic = "bro/cluster/logger" &redef;
> +
> +	## The topic name used for exchanging messages that are relevant to
> +	## manager nodes in a cluster.  Used with broker-enabled cluster communication.
> +	const manager_topic = "bro/cluster/manager" &redef;
> +
> +	## The topic name used for exchanging messages that are relevant to
> +	## proxy nodes in a cluster.  Used with broker-enabled cluster communication.
> +	const proxy_topic = "bro/cluster/proxy" &redef;
> +
> +	## The topic name used for exchanging messages that are relevant to
> +	## worker nodes in a cluster.  Used with broker-enabled cluster communication.
> +	const worker_topic = "bro/cluster/worker" &redef;
> +
> +	## The topic name used for exchanging messages that are relevant to
> +	## time machine nodes in a cluster.  Used with broker-enabled cluster communication.
> +	const time_machine_topic = "bro/cluster/time_machine" &redef;
> +
> +	## The topic prefix used for exchanging messages that are relevant to
> +	## a named node in a cluster.  Used with broker-enabled cluster communication.
> +	const node_topic_prefix = "bro/cluster/node/" &redef;
> +
> +	## Name of the node on which master data stores will be created if no other
> +	## has already been specified by the user in :bro:see:`Cluster::stores`.
> +	const default_master_node = "manager" &redef;
> +
> +	## The type of data store backend that will be used for all data stores if
> +	## no other has already been specified by the user in :bro:see:`Cluster::stores`.
> +	const default_backend = Broker::MEMORY &redef;
> +
> +	## The type of persistent data store backend that will be used for all data
> +	## stores if no other has already been specified by the user in
> +	## :bro:see:`Cluster::stores`.  This will be used when script authors call
> +	## :bro:see:`Cluster::create_store` with the *persistent* argument set true.
> +	const default_persistent_backend = Broker::SQLITE &redef;
> +
> +	## Setting a default dir will, for persistent backends that have not
> +	## been given an explicit file path via :bro:see:`Cluster::stores`,
> +	## automatically create a path within this dir that is based on the name of
> +	## the data store.
> +	const default_store_dir = "" &redef;
> +
> +	## Information regarding a cluster-enabled data store.
> +	type StoreInfo: record {
> +		## The name of the data store.
> +		name: string &optional;
> +		## The store handle.
> +		store: opaque of Broker::Store &optional;
> +		## The name of the cluster node on which the master version of the data
> +		## store resides.
> +		master_node: string &default=default_master_node;
> +		## Whether the data store is the master version or a clone.
> +		master: bool &default=F;
> +		## The type of backend used for storing data.
> +		backend: Broker::BackendType &default=default_backend;
> +		## Parameters used for configuring the backend.
> +		options: Broker::BackendOptions &default=Broker::BackendOptions();
> +	};
> +
> +	## A table of cluster-enabled data stores that have been created, indexed
> +	## by their name.  To customize a particular data store, you may redef this,
> +	## defining the :bro:see:`StoreInfo` to associate with the store's name.
> +	global stores: table[string] of StoreInfo &default=StoreInfo() &redef;
> +
> +	## Sets up a cluster-enabled data store.  They will also still properly
> +	## function for uses that are not operating a cluster.
> +	##
> +	## name: the name of the data store to create.
> +	##
> +	## persistent: whether the data store must be persistent.
> +	##
> +	## Returns: the store's information.  For master stores, the store will be
> +	##          ready to use immediately.  For clones, the store field will not
> +	##          be set until the node containing the master store has connected.
> +	global create_store: function(name: string, persistent: bool &default=F): StoreInfo;
> +
>  	## The cluster logging stream identifier.
>  	redef enum Log::ID += { LOG };
>  
> @@ -18,6 +104,8 @@ export {
>  	type Info: record {
>  		## The time at which a cluster message was generated.
>  		ts:       time;
> +		## The name of the node that is creating the log record.
> +		node: string;
>  		## A message indicating information about the cluster's operation.
>  		message:  string;
>  	} &log;
> @@ -92,8 +180,7 @@ export {
>  		## If the *ip* field is a non-global IPv6 address, this field
>  		## can specify a particular :rfc:`4007` ``zone_id``.
>  		zone_id:      string      &default="";
> -		## The port to which this local node can connect when
> -		## establishing communication.
> +		## The port that this node will listen on for peer connections.
>  		p:            port;
>  		## Identifier for the interface a worker is sniffing.
>  		interface:    string      &optional;
> @@ -108,6 +195,8 @@ export {
>  		workers:      set[string] &optional;
>  		## Name of a time machine node with which this node connects.
>  		time_machine: string      &optional;
> +		## A unique identifier assigned to the node by the broker framework.
> +		id: string                &optional;
>  	};
>  
>  	## This function can be called at any time to determine if the cluster
> @@ -134,6 +223,8 @@ export {
>  	## named cluster-layout.bro somewhere in the BROPATH.  It will be
>  	## automatically loaded if the CLUSTER_NODE environment variable is set.
>  	## Note that BroControl handles all of this automatically.
> +	## The table is typically indexed by node names/labels (e.g. "manager"
> +	## or "worker-1").
>  	const nodes: table[string] of Node = {} &redef;
>  
>  	## Indicates whether or not the manager will act as the logger and receive
> @@ -148,6 +239,15 @@ export {
>  
>  	## Interval for retrying failed connections between cluster nodes.
>  	const retry_interval = 1min &redef;
> +
> +	## When using broker-enabled cluster framework, nodes use this event to
> +	## exchange their user-defined name along with a string that uniquely
> +	## identifies it for the duration of its lifetime (this string may change if
> +	## the node dies and has to reconnect later).
> +	global hello: event(name: string, id: string);
> +
> +	## Write a message to the cluster logging stream.
> +	global log: function(msg: string);
>  }
>  
>  function is_enabled(): bool
> @@ -163,13 +263,112 @@ function local_node_type(): NodeType
>  event remote_connection_handshake_done(p: event_peer) &priority=5
>  	{
>  	if ( p$descr in nodes && nodes[p$descr]$node_type == WORKER )
> -		++worker_count;
> +		{
> +		if ( use_broker )
> +			Reporter::error(fmt("broker-enabled cluster using old comms: '%s' ", node));
> +		else
> +			++worker_count;
> +		}
>  	}
>  
>  event remote_connection_closed(p: event_peer) &priority=5
>  	{
>  	if ( p$descr in nodes && nodes[p$descr]$node_type == WORKER )
> -		--worker_count;
> +		{
> +		if ( use_broker )
> +			Reporter::error(fmt("broker-enabled cluster using old comms: '%s' ", node));
> +		else
> +			--worker_count;
> +		}
> +	}
> +
> +event Cluster::hello(name: string, id: string) &priority=10
> +	{
> +	if ( name !in nodes )
> +		{
> +		Reporter::error(fmt("Got Cluster::hello msg from unexpected node: %s", name));
> +		return;
> +		}
> +
> +	local n = nodes[name];
> +
> +	if ( n?$id && n$id != id )
> +		Reporter::error(fmt("Got Cluster::hello msg from duplicate node: %s", name));
> +
> +	n$id = id;
> +	Cluster::log(fmt("got hello from %s (%s)", name, id));
> +
> +	if ( n$node_type == WORKER )
> +		++worker_count;
> +
> +	for ( store_name in stores )
> +		{
> +		local info = stores[store_name];
> +
> +		if ( info?$store )
> +			next;
> +
> +		if ( info$master )
> +			next;
> +
> +		if ( info$master_node == name )
> +			{
> +			info$store = Broker::create_clone(info$name);
> +			Cluster::log(fmt("created clone store: %s", info$name));
> +			}
> +		}
> +	}
> +
> +event Broker::peer_added(endpoint: Broker::EndpointInfo, msg: string) &priority=10
> +	{
> +	if ( ! use_broker )
> +		return;
> +
> +	if ( ! Cluster::is_enabled() )
> +		return;
> +
> +	local e = Broker::make_event(Cluster::hello, node, Broker::node_id());
> +	Broker::publish(Cluster::broadcast_topic, e);
> +	}
> +
> +event Broker::peer_lost(endpoint: Broker::EndpointInfo, msg: string) &priority=10
> +	{
> +	if ( ! use_broker )
> +		return;
> +
> +	for ( node_name in nodes )
> +		{
> +		local n = nodes[node_name];
> +
> +		if ( n?$id && n$id == endpoint$id )
> +			{
> +			Cluster::log(fmt("node down: %s", node_name));
> +			delete n$id;
> +
> +			if ( n$node_type == WORKER )
> +				--worker_count;
> +
> +			for ( store_name in stores )
> +				{
> +				local info = stores[store_name];
> +
> +				if ( ! info?$store )
> +					next;
> +
> +				if ( info$master )
> +					next;
> +
> +				if ( info$master_node == node_name )
> +					{
> +					Broker::close(info$store);
> +					delete info$store;
> +					Cluster::log(fmt("clone store closed: %s", info$name));
> +					}
> +				}
> +
> +			break;
> +			}
> +		}
>  	}
>  
>  event bro_init() &priority=5
> @@ -183,3 +382,75 @@ event bro_init() &priority=5
>  
>  	Log::create_stream(Cluster::LOG, [$columns=Info, $path="cluster"]);
>  	}
> +
> +function create_store(name: string, persistent: bool &default=F): Cluster::StoreInfo
> +	{
> +	local info = stores[name];
> +	info$name = name;
> +
> +	if ( Cluster::default_store_dir != "" )
> +		{
> +		local default_options = Broker::BackendOptions();
> +		local path = Cluster::default_store_dir + "/" + name;
> +
> +		if ( info$options$sqlite$path == default_options$sqlite$path )
> +			info$options$sqlite$path = path + ".sqlite";
> +
> +		if ( info$options$rocksdb$path == default_options$rocksdb$path )
> +			info$options$rocksdb$path = path + ".rocksdb";
> +		}
> +
> +	if ( persistent )
> +		{
> +		switch ( info$backend ) {
> +		case Broker::MEMORY:
> +			info$backend = Cluster::default_persistent_backend;
> +			break;
> +		case Broker::SQLITE:
> +			fallthrough;
> +		case Broker::ROCKSDB:
> +			# no-op: user already asked for a specific persistent backend.
> +			break;
> +		default:
> +			Reporter::error(fmt("unhandled data store type: %s", info$backend));
> +			break;
> +		}
> +		}
> +
> +	if ( ! Cluster::is_enabled() )
> +		{
> +		if ( info?$store )
> +			{
> +			Reporter::warning(fmt("duplicate cluster store creation for %s", name));
> +			return info;
> +			}
> +
> +		info$store = Broker::create_master(name, info$backend, info$options);
> +		info$master = T;
> +		stores[name] = info;
> +		Cluster::log(fmt("created master store: %s", name));
> +		return info;
> +		}
> +
> +	if ( info$master_node !in Cluster::nodes )
> +		Reporter::fatal(fmt("master node '%s' for cluster store '%s' does not exist",
> +		                    info$master_node, name));
> +
> +	if ( Cluster::node == info$master_node )
> +		{
> +		info$store = Broker::create_master(name, info$backend, info$options);
> +		info$master = T;
> +		stores[name] = info;
> +		return info;
> +		}
> +
> +	info$master = F;
> +	stores[name] = info;
> +	Cluster::log(fmt("pending clone store creation: %s", name));
> +	return info;
> +	}
> +
> +function log(msg: string)
> +	{
> +	Log::write(Cluster::LOG, [$ts = network_time(), $node = node, $message = msg]);
> +	}
> diff --git a/scripts/base/frameworks/cluster/setup-connections.bro b/scripts/base/frameworks/cluster/setup-connections.bro
> index 971a55d..2e775da 100644
> --- a/scripts/base/frameworks/cluster/setup-connections.bro
> +++ b/scripts/base/frameworks/cluster/setup-connections.bro
> @@ -3,13 +3,163 @@
>  
>  @load ./main
>  @load base/frameworks/communication
> + at load base/frameworks/broker
>  
>  @if ( Cluster::node in Cluster::nodes )
>  
>  module Cluster;
>  
> +type NamedNode: record {
> +	name: string;
> +	node: Node;
> +};
> +
> +function nodes_with_type(node_type: NodeType): vector of NamedNode
> +	{
> +	local rval: vector of NamedNode = vector();
> +
> +	for ( name in Cluster::nodes )
> +		{
> +		local n = Cluster::nodes[name];
> +
> +		if ( n$node_type != node_type )
> +			next;
> +
> +		rval[|rval|] = NamedNode($name=name, $node=n);
> +		}
> +
> +	return rval;
> +	}
> +
> +function connect_peer(node_type: NodeType, node_name: string): bool
> +	{
> +	local nn = nodes_with_type(node_type);
> +
> +	for ( i in nn )
> +		{
> +		local n = nn[i];
> +
> +		if ( n$name != node_name )
> +			next;
> +
> +		Cluster::log(fmt("initiate peering with %s:%s, retry=%s",
> +		                 n$node$ip, n$node$p, Cluster::retry_interval));
> +		return Broker::peer(cat(n$node$ip), n$node$p, Cluster::retry_interval);
> +		}
> +
> +	return F;
> +	}
> +
>  event bro_init() &priority=9
>  	{
> +	if ( ! use_broker )
> +		return;
> +
> +	local self = nodes[node];
> +
> +	switch ( self$node_type ) {
> +	case NONE:
> +		return;
> +	case CONTROL:
> +		break;
> +	case LOGGER:
> +		Broker::subscribe(Cluster::logger_topic);
> +		Broker::subscribe(Broker::log_topic);
> +		break;
> +	case MANAGER:
> +		Broker::subscribe(Cluster::manager_topic);
> +
> +		if ( Cluster::manager_is_logger )
> +			Broker::subscribe(Broker::log_topic);
> +
> +		break;
> +	case PROXY:
> +		Broker::subscribe(Cluster::proxy_topic);
> +		break;
> +	case WORKER:
> +		Broker::subscribe(Cluster::worker_topic);
> +		break;
> +	case TIME_MACHINE:
> +		Broker::subscribe(Cluster::time_machine_topic);
> +		break;
> +	default:
> +		Reporter::error(fmt("Unhandled cluster node type: %s", self$node_type));
> +		return;
> +	}
> +
> +	Broker::subscribe(Cluster::broadcast_topic);
> +	Broker::subscribe(Cluster::node_topic_prefix + node);
> +
> +	Broker::listen(Broker::default_listen_address,
> +	               self$p,
> +	               Broker::default_listen_retry);
> +
> +	Cluster::log(fmt("listening on %s:%s", Broker::default_listen_address, self$p));
> +
> +	switch ( self$node_type ) {
> +	case MANAGER:
> +		if ( self?$logger )
> +			connect_peer(LOGGER, self$logger);
> +
> +		if ( self?$time_machine )
> +			connect_peer(TIME_MACHINE, self$time_machine);
> +
> +		break;
> +	case PROXY:
> +		if ( self?$logger )
> +			connect_peer(LOGGER, self$logger);
> +
> +		if ( self?$manager )
> +			connect_peer(MANAGER, self$manager);
> +
> +		local proxies = nodes_with_type(PROXY);
> +
> +		for ( i in proxies )
> +			{
> +			local proxy = proxies[i];
> +
> +			if ( proxy$node?$proxy )
> +				Broker::peer(cat(proxy$node$ip), proxy$node$p, Cluster::retry_interval);
> +			}
> +
> +		break;
> +	case WORKER:
> +		if ( self?$logger )
> +			connect_peer(LOGGER, self$logger);
> +
> +		if ( self?$manager )
> +			connect_peer(MANAGER, self$manager);
> +
> +		if ( self?$proxy )
> +			connect_peer(PROXY, self$proxy);
> +
> +		if ( self?$time_machine )
> +			connect_peer(TIME_MACHINE, self$time_machine);
> +
> +		break;
> +	}
> +	}
> +
> +event bro_init() &priority=-10
> +	{
> +	if ( use_broker )
> +		return;
> +
> +	local lp = Cluster::nodes[Cluster::node]$p;
> +	enable_communication();
> +	listen(Communication::listen_interface,
> +	       lp,
> +	       Communication::listen_ssl,
> +	       Communication::listen_ipv6,
> +	       Communication::listen_ipv6_zone_id,
> +	       Communication::listen_retry);
> +	}
> +
> +event bro_init() &priority=9
> +	{
> +	if ( use_broker )
> +		return;
> +
>  	local me = nodes[node];
>  
>  	for ( i in Cluster::nodes )
> diff --git a/scripts/base/frameworks/control/main.bro b/scripts/base/frameworks/control/main.bro
> index 5c68c47..e3b58ef 100644
> --- a/scripts/base/frameworks/control/main.bro
> +++ b/scripts/base/frameworks/control/main.bro
> @@ -5,6 +5,7 @@
>  module Control;
>  
>  export {
> +	## Whether the control framework uses broker to perform remote communication.
>  	const use_broker = T &redef;
>  
>  	## The address of the host that will be controlled.
> diff --git a/src/broker/Manager.cc b/src/broker/Manager.cc
> index e9d6280..b2cbe87 100644
> --- a/src/broker/Manager.cc
> +++ b/src/broker/Manager.cc
> @@ -113,7 +113,6 @@ Manager::BrokerState::BrokerState(broker::broker_options options)
>  
>  Manager::Manager()
>  	{
> -	routable = false;
>  	bound_port = 0;
>  
>  	next_timestamp = 1;
> @@ -128,6 +127,7 @@ void Manager::InitPostScript()
>  	{
>  	DBG_LOG(DBG_BROKER, "Initializing");
>  
> +	log_topic = get_option("Broker::log_topic")->AsString()->CheckString();
>  	log_id_type = internal_type("Log::ID")->AsEnumType();
>  	writer_id_type = internal_type("Log::Writer")->AsEnumType();
>  
> @@ -144,6 +144,7 @@ void Manager::InitPostScript()
>  
>  	broker::broker_options options;
>  	options.disable_ssl = get_option("Broker::disable_ssl")->AsBool();
> +	options.forward = get_option("Broker::forward_messages")->AsBool();
>  
>  	bstate = std::make_shared<BrokerState>(options);
>  	}
> @@ -176,18 +177,6 @@ bool Manager::Active()
>  	return bound_port > 0 || bstate->endpoint.peers().size();
>  	}
>  
> -bool Manager::Configure(bool arg_routable, std::string arg_log_topic)
> -	{
> -	DBG_LOG(DBG_BROKER, "Configuring endpoint: routable=%s log_topic=%s",
> -		(routable ? "yes" : "no"), arg_log_topic.c_str());
> -
> -	routable = arg_routable;
> -	log_topic = arg_log_topic;
> -
> -	// TODO: process routable flag
> -	return true;
> -	}
> -
>  uint16_t Manager::Listen(const string& addr, uint16_t port)
>  	{
>  	bound_port = bstate->endpoint.listen(addr, port);
> @@ -233,6 +222,11 @@ std::vector<broker::peer_info> Manager::Peers() const
>  	return bstate->endpoint.peers();
>  	}
>  
> +std::string Manager::NodeID() const
> +	{
> +	return to_string(bstate->endpoint.node_id());
> +	}
> +
>  bool Manager::PublishEvent(string topic, std::string name, broker::vector args)
>  	{
>  	if ( ! bstate->endpoint.peers().size() )
> diff --git a/src/broker/Manager.h b/src/broker/Manager.h
> index a430c57..5af23e4 100644
> --- a/src/broker/Manager.h
> +++ b/src/broker/Manager.h
> @@ -74,15 +74,6 @@ public:
>  	bool Active();
>  
>  	/**
> -	 * Configure the local Broker endpoint.
> -	 * @param routable Whether the context of this endpoint routes messages not
> -	 * @param log_topic The topic prefix for logs we this endpoint published.
> -	 * destined to itself. By default endpoints do not route.
> -	 * @return true if configuration was successful.
> -	 */
> -	bool Configure(bool routable = false, std::string log_topic="");
> -
> -	/**
>  	 * Listen for remote connections.
>  	 * @param port the TCP port to listen on.
>  	 * @param addr an address string on which to accept connections, e.g.
> @@ -115,6 +106,11 @@ public:
>  	std::vector<broker::peer_info> Peers() const;
>  
>  	/**
> +	 * @return a unique identifier for this broker endpoint.
> +	 */
> +	std::string NodeID() const;
> +
> +	/**
>  	 * Send an event to any interested peers.
>  	 * @param topic a topic string associated with the message.
>  	 * Peers advertise interest by registering a subscription to some prefix
> @@ -296,7 +292,6 @@ private:
>  	broker::endpoint& Endpoint()
>  		{ assert(bstate); return bstate->endpoint; }
>  
> -	bool routable;
>  	std::string log_topic;
>  	uint16_t bound_port;
>  
> diff --git a/src/broker/comm.bif b/src/broker/comm.bif
> index 411e3d4..c7c94d4 100644
> --- a/src/broker/comm.bif
> +++ b/src/broker/comm.bif
> @@ -7,8 +7,6 @@
>  
>  module Broker;
>  
> -type Broker::Options: record;
> -
>  ## Generated when something changes in the Broker sub-system.
>  event Broker::status%(endpoint: EndpointInfo, msg: string%);
>  
> @@ -51,20 +49,6 @@ enum PeerStatus %{
>  	RECONNECTING,
>  %}
>  
> -function Broker::__configure%(options: Broker::Options%): bool
> -	%{
> -	auto routable = false;
> -	auto log_topic = "";
> -
> -	if ( auto routable_val = options->AsRecordVal()->Lookup(0) )
> -		routable = routable_val->AsBool();
> -
> -	if ( auto log_topic_val = options->AsRecordVal()->Lookup(1) )
> -		log_topic = log_topic_val->AsString()->CheckString();
> -
> -	return new Val(broker_mgr->Configure(routable, log_topic), TYPE_BOOL);
> -	%}
> -
>  function Broker::__listen%(a: string, p: port%): port
>  	%{
>  	if ( ! p->IsTCP() )
> @@ -140,3 +124,8 @@ function Broker::__peers%(%): PeerInfos
>  
>  	return rval;
>  	%}
> +
> +function Broker::__node_id%(%): string
> +	%{
> +	return new StringVal(broker_mgr->NodeID());
> +	%}
> diff --git a/src/iosource/PktSrc.cc b/src/iosource/PktSrc.cc
> index a9362a0..343801a 100644
> --- a/src/iosource/PktSrc.cc
> +++ b/src/iosource/PktSrc.cc
> @@ -10,6 +10,8 @@
>  #include "Hash.h"
>  #include "Net.h"
>  #include "Sessions.h"
> +#include "broker/Manager.h"
> +#include "iosource/Manager.h"
>  
>  #include "pcap/pcap.bif.h"
>  
> @@ -304,13 +306,19 @@ bool PktSrc::ExtractNextPacketInternal()
>  		return 1;
>  		}
>  
> -	if ( pseudo_realtime && using_communication && ! IsOpen() )
> +	if ( pseudo_realtime && ! IsOpen() )
>  		{
> -		// Source has gone dry, we're done.
> -		if ( remote_trace_sync_interval )
> -			remote_serializer->SendFinalSyncPoint();
> -		else
> -			remote_serializer->Terminate();
> +		if ( using_communication )
> +			{
> +			// Source has gone dry, we're done.
> +			if ( remote_trace_sync_interval )
> +				remote_serializer->SendFinalSyncPoint();
> +			else
> +				remote_serializer->Terminate();
> +			}
> +
> +		if ( broker_mgr->Active() )
> +			iosource_mgr->Terminate();
>  		}
>  
>  	SetIdle(true);
> diff --git a/testing/btest/Baseline/plugins.hooks/output b/testing/btest/Baseline/plugins.hooks/output
> index f7c61c0..5eefc64 100644
> --- a/testing/btest/Baseline/plugins.hooks/output
> +++ b/testing/btest/Baseline/plugins.hooks/output
> @@ -148,8 +148,6 @@
>  0.000000   MetaHookPost  CallFunction(Analyzer::register_for_ports, <frame>, (Analyzer::ANALYZER_SYSLOG, {514/udp})) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Analyzer::register_for_ports, <frame>, (Analyzer::ANALYZER_TEREDO, {3544/udp})) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Analyzer::register_for_ports, <frame>, (Analyzer::ANALYZER_XMPP, {5222<...>/tcp})) -> <no result>
> -0.000000   MetaHookPost  CallFunction(Broker::__configure, <frame>, ([routable=F, log_topic=bro<...>/])) -> <no result>
> -0.000000   MetaHookPost  CallFunction(Broker::configure, <frame>, ([routable=F, log_topic=bro<...>/])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Cluster::is_enabled, <frame>, ()) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Cluster::is_enabled, <null>, ()) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Files::register_analyzer_add_callback, <frame>, (Files::ANALYZER_EXTRACT, FileExtract::on_add{ if (!FileExtract::args?$extract_filename) FileExtract::args$extract_filename = cat(extract-, FileExtract::f$last_active, -, FileExtract::f$source, -, FileExtract::f$id)FileExtract::f$info$extracted = FileExtract::args$extract_filenameFileExtract::args$extract_filename = build_path_compressed(FileExtract::prefix, FileExtract::args$extract_filename)FileExtract::f$info$extracted_cutoff = Fmkdir(FileExtract::prefix)})) -> <no result>
> @@ -251,7 +249,7 @@
>  0.000000   MetaHookPost  CallFunction(Log::__create_stream, <frame>, (Weird::LOG, [columns=<no value description>, ev=Weird::log_weird, path=weird])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::__create_stream, <frame>, (X509::LOG, [columns=<no value description>, ev=X509::log_x509, path=x509])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::__create_stream, <frame>, (mysql::LOG, [columns=<no value description>, ev=MySQL::log_mysql, path=mysql])) -> <no result>
> -0.000000   MetaHookPost  CallFunction(Log::__write, <frame>, (PacketFilter::LOG, [ts=1502745368.796663, node=bro, filter=ip or not ip, init=T, success=T])) -> <no result>
> +0.000000   MetaHookPost  CallFunction(Log::__write, <frame>, (PacketFilter::LOG, [ts=1509124227.694371, node=bro, filter=ip or not ip, init=T, success=T])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::add_default_filter, <frame>, (Broker::LOG)) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::add_default_filter, <frame>, (Cluster::LOG)) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::add_default_filter, <frame>, (Communication::LOG)) -> <no result>
> @@ -384,7 +382,7 @@
>  0.000000   MetaHookPost  CallFunction(Log::create_stream, <frame>, (Weird::LOG, [columns=<no value description>, ev=Weird::log_weird, path=weird])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::create_stream, <frame>, (X509::LOG, [columns=<no value description>, ev=X509::log_x509, path=x509])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Log::create_stream, <frame>, (mysql::LOG, [columns=<no value description>, ev=MySQL::log_mysql, path=mysql])) -> <no result>
> -0.000000   MetaHookPost  CallFunction(Log::write, <frame>, (PacketFilter::LOG, [ts=1502745368.796663, node=bro, filter=ip or not ip, init=T, success=T])) -> <no result>
> +0.000000   MetaHookPost  CallFunction(Log::write, <frame>, (PacketFilter::LOG, [ts=1509124227.694371, node=bro, filter=ip or not ip, init=T, success=T])) -> <no result>
>  0.000000   MetaHookPost  CallFunction(NetControl::check_plugins, <frame>, ()) -> <no result>
>  0.000000   MetaHookPost  CallFunction(NetControl::init, <null>, ()) -> <no result>
>  0.000000   MetaHookPost  CallFunction(Notice::want_pp, <frame>, ()) -> <no result>
> @@ -872,8 +870,6 @@
>  0.000000   MetaHookPre   CallFunction(Analyzer::register_for_ports, <frame>, (Analyzer::ANALYZER_SYSLOG, {514/udp}))
>  0.000000   MetaHookPre   CallFunction(Analyzer::register_for_ports, <frame>, (Analyzer::ANALYZER_TEREDO, {3544/udp}))
>  0.000000   MetaHookPre   CallFunction(Analyzer::register_for_ports, <frame>, (Analyzer::ANALYZER_XMPP, {5222<...>/tcp}))
> -0.000000   MetaHookPre   CallFunction(Broker::__configure, <frame>, ([routable=F, log_topic=bro<...>/]))
> -0.000000   MetaHookPre   CallFunction(Broker::configure, <frame>, ([routable=F, log_topic=bro<...>/]))
>  0.000000   MetaHookPre   CallFunction(Cluster::is_enabled, <frame>, ())
>  0.000000   MetaHookPre   CallFunction(Cluster::is_enabled, <null>, ())
>  0.000000   MetaHookPre   CallFunction(Files::register_analyzer_add_callback, <frame>, (Files::ANALYZER_EXTRACT, FileExtract::on_add{ if (!FileExtract::args?$extract_filename) FileExtract::args$extract_filename = cat(extract-, FileExtract::f$last_active, -, FileExtract::f$source, -, FileExtract::f$id)FileExtract::f$info$extracted = FileExtract::args$extract_filenameFileExtract::args$extract_filename = build_path_compressed(FileExtract::prefix, FileExtract::args$extract_filename)FileExtract::f$info$extracted_cutoff = Fmkdir(FileExtract::prefix)}))
> @@ -975,7 +971,7 @@
>  0.000000   MetaHookPre   CallFunction(Log::__create_stream, <frame>, (Weird::LOG, [columns=<no value description>, ev=Weird::log_weird, path=weird]))
>  0.000000   MetaHookPre   CallFunction(Log::__create_stream, <frame>, (X509::LOG, [columns=<no value description>, ev=X509::log_x509, path=x509]))
>  0.000000   MetaHookPre   CallFunction(Log::__create_stream, <frame>, (mysql::LOG, [columns=<no value description>, ev=MySQL::log_mysql, path=mysql]))
> -0.000000   MetaHookPre   CallFunction(Log::__write, <frame>, (PacketFilter::LOG, [ts=1502745368.796663, node=bro, filter=ip or not ip, init=T, success=T]))
> +0.000000   MetaHookPre   CallFunction(Log::__write, <frame>, (PacketFilter::LOG, [ts=1509124227.694371, node=bro, filter=ip or not ip, init=T, success=T]))
>  0.000000   MetaHookPre   CallFunction(Log::add_default_filter, <frame>, (Broker::LOG))
>  0.000000   MetaHookPre   CallFunction(Log::add_default_filter, <frame>, (Cluster::LOG))
>  0.000000   MetaHookPre   CallFunction(Log::add_default_filter, <frame>, (Communication::LOG))
> @@ -1108,7 +1104,7 @@
>  0.000000   MetaHookPre   CallFunction(Log::create_stream, <frame>, (Weird::LOG, [columns=<no value description>, ev=Weird::log_weird, path=weird]))
>  0.000000   MetaHookPre   CallFunction(Log::create_stream, <frame>, (X509::LOG, [columns=<no value description>, ev=X509::log_x509, path=x509]))
>  0.000000   MetaHookPre   CallFunction(Log::create_stream, <frame>, (mysql::LOG, [columns=<no value description>, ev=MySQL::log_mysql, path=mysql]))
> -0.000000   MetaHookPre   CallFunction(Log::write, <frame>, (PacketFilter::LOG, [ts=1502745368.796663, node=bro, filter=ip or not ip, init=T, success=T]))
> +0.000000   MetaHookPre   CallFunction(Log::write, <frame>, (PacketFilter::LOG, [ts=1509124227.694371, node=bro, filter=ip or not ip, init=T, success=T]))
>  0.000000   MetaHookPre   CallFunction(NetControl::check_plugins, <frame>, ())
>  0.000000   MetaHookPre   CallFunction(NetControl::init, <null>, ())
>  0.000000   MetaHookPre   CallFunction(Notice::want_pp, <frame>, ())
> @@ -1596,8 +1592,6 @@
>  0.000000 | HookCallFunction Analyzer::register_for_ports(Analyzer::ANALYZER_SYSLOG, {514/udp})
>  0.000000 | HookCallFunction Analyzer::register_for_ports(Analyzer::ANALYZER_TEREDO, {3544/udp})
>  0.000000 | HookCallFunction Analyzer::register_for_ports(Analyzer::ANALYZER_XMPP, {5222<...>/tcp})
> -0.000000 | HookCallFunction Broker::__configure([routable=F, log_topic=bro<...>/])
> -0.000000 | HookCallFunction Broker::configure([routable=F, log_topic=bro<...>/])
>  0.000000 | HookCallFunction Cluster::is_enabled()
>  0.000000 | HookCallFunction Files::register_analyzer_add_callback(Files::ANALYZER_EXTRACT, FileExtract::on_add{ if (!FileExtract::args?$extract_filename) FileExtract::args$extract_filename = cat(extract-, FileExtract::f$last_active, -, FileExtract::f$source, -, FileExtract::f$id)FileExtract::f$info$extracted = FileExtract::args$extract_filenameFileExtract::args$extract_filename = build_path_compressed(FileExtract::prefix, FileExtract::args$extract_filename)FileExtract::f$info$extracted_cutoff = Fmkdir(FileExtract::prefix)})
>  0.000000 | HookCallFunction Files::register_for_mime_type(Files::ANALYZER_PE, application/x-dosexec)
> @@ -1698,7 +1692,7 @@
>  0.000000 | HookCallFunction Log::__create_stream(Weird::LOG, [columns=<no value description>, ev=Weird::log_weird, path=weird])
>  0.000000 | HookCallFunction Log::__create_stream(X509::LOG, [columns=<no value description>, ev=X509::log_x509, path=x509])
>  0.000000 | HookCallFunction Log::__create_stream(mysql::LOG, [columns=<no value description>, ev=MySQL::log_mysql, path=mysql])
> -0.000000 | HookCallFunction Log::__write(PacketFilter::LOG, [ts=1502745368.796663, node=bro, filter=ip or not ip, init=T, success=T])
> +0.000000 | HookCallFunction Log::__write(PacketFilter::LOG, [ts=1509124227.694371, node=bro, filter=ip or not ip, init=T, success=T])
>  0.000000 | HookCallFunction Log::add_default_filter(Broker::LOG)
>  0.000000 | HookCallFunction Log::add_default_filter(Cluster::LOG)
>  0.000000 | HookCallFunction Log::add_default_filter(Communication::LOG)
> @@ -1831,7 +1825,7 @@
>  0.000000 | HookCallFunction Log::create_stream(Weird::LOG, [columns=<no value description>, ev=Weird::log_weird, path=weird])
>  0.000000 | HookCallFunction Log::create_stream(X509::LOG, [columns=<no value description>, ev=X509::log_x509, path=x509])
>  0.000000 | HookCallFunction Log::create_stream(mysql::LOG, [columns=<no value description>, ev=MySQL::log_mysql, path=mysql])
> -0.000000 | HookCallFunction Log::write(PacketFilter::LOG, [ts=1502745368.796663, node=bro, filter=ip or not ip, init=T, success=T])
> +0.000000 | HookCallFunction Log::write(PacketFilter::LOG, [ts=1509124227.694371, node=bro, filter=ip or not ip, init=T, success=T])
>  0.000000 | HookCallFunction NetControl::check_plugins()
>  0.000000 | HookCallFunction NetControl::init()
>  0.000000 | HookCallFunction Notice::want_pp()
> diff --git a/testing/btest/Baseline/scripts.base.frameworks.cluster.start-it-up/manager-1..stdout b/testing/btest/Baseline/scripts.base.frameworks.cluster.start-it-up/manager-1..stdout
> index 7c8eb5e..5b10602 100644
> --- a/testing/btest/Baseline/scripts.base.frameworks.cluster.start-it-up/manager-1..stdout
> +++ b/testing/btest/Baseline/scripts.base.frameworks.cluster.start-it-up/manager-1..stdout
> @@ -1,4 +1,8 @@
>  Connected to a peer
>  Connected to a peer
>  Connected to a peer
> +Got fully_connected event
> +Got fully_connected event
>  Connected to a peer
> +Got fully_connected event
> +Got fully_connected event
> diff --git a/testing/btest/Baseline/scripts.base.frameworks.logging.field-extension-cluster-error/manager-1.reporter.log b/testing/btest/Baseline/scripts.base.frameworks.logging.field-extension-cluster-error/manager-1.reporter.log
> index b7d8c11..e7972c2 100644
> --- a/testing/btest/Baseline/scripts.base.frameworks.logging.field-extension-cluster-error/manager-1.reporter.log
> +++ b/testing/btest/Baseline/scripts.base.frameworks.logging.field-extension-cluster-error/manager-1.reporter.log
> @@ -3,11 +3,10 @@
>  #empty_field	(empty)
>  #unset_field	-
>  #path	reporter
> -#open	2016-09-22-23-31-34
> +#open	2017-10-26-19-18-59
>  #fields	_write_ts	_stream	_system_name	ts	level	message	location
>  #types	time	string	string	time	enum	string	string
> -1474587094.261799	reporter	manager-1	0.000000	Reporter::WARNING	WriterFrontend communication/Log::WRITER_ASCII expected 11 fields in write, got 8. Skipping line.	(empty)
> -1474587094.261799	reporter	manager-1	0.000000	Reporter::WARNING	WriterFrontend communication/Log::WRITER_ASCII expected 11 fields in write, got 8. Skipping line.	(empty)
> -1474587094.261799	reporter	manager-1	0.000000	Reporter::WARNING	WriterFrontend communication/Log::WRITER_ASCII expected 11 fields in write, got 8. Skipping line.	(empty)
> -1474587099.984660	reporter	manager-1	0.000000	Reporter::INFO	received termination signal	(empty)
> -#close	2016-09-22-23-31-40
> +1509045539.693078	reporter	manager-1	0.000000	Reporter::WARNING	Write using filter 'default' on path 'broker' changed to use new path 'broker-2' to avoid conflict with filter ''	(empty)
> +1509045539.699623	reporter	manager-1	0.000000	Reporter::WARNING	WriterFrontend cluster/Log::WRITER_ASCII expected 6 fields in write, got 3. Skipping line.	(empty)
> +1509045547.196521	reporter	manager-1	0.000000	Reporter::INFO	received termination signal	(empty)
> +#close	2017-10-26-19-19-07
> diff --git a/testing/btest/broker/remote_log_types.bro b/testing/btest/broker/remote_log_types.bro
> index aeaf1b9..56d63eb 100644
> --- a/testing/btest/broker/remote_log_types.bro
> +++ b/testing/btest/broker/remote_log_types.bro
> @@ -1,4 +1,4 @@
> - @TEST-SERIALIZE: brokercomm
> +# @TEST-SERIALIZE: brokercomm
>  
>  # @TEST-EXEC: btest-bg-run recv "bro -b ../recv.bro >recv.out"
>  # @TEST-EXEC: btest-bg-run send "bro -b ../send.bro >send.out"
> diff --git a/testing/btest/scripts/base/frameworks/cluster/start-it-up-logger.bro b/testing/btest/scripts/base/frameworks/cluster/start-it-up-logger.bro
> index 97f3698..72251c0 100644
> --- a/testing/btest/scripts/base/frameworks/cluster/start-it-up-logger.bro
> +++ b/testing/btest/scripts/base/frameworks/cluster/start-it-up-logger.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run logger-1  CLUSTER_NODE=logger-1 BROPATH=$BROPATH:.. bro %INPUT
>  # @TEST-EXEC: sleep 1
> @@ -38,10 +38,16 @@ global fully_connected_nodes = 0;
>  event fully_connected()
>  	{
>  	++fully_connected_nodes;
> +
>  	if ( Cluster::node == "logger-1" )
>  		{
>  		if ( peer_count == 5 && fully_connected_nodes == 5 )
> -			terminate_communication();
> +			{
> +			if ( Cluster::use_broker )
> +				terminate();
> +			else
> +				terminate_communication();
> +			}
>  		}
>  	}
>  
> @@ -49,6 +55,43 @@ redef Cluster::worker2logger_events += /fully_connected/;
>  redef Cluster::proxy2logger_events += /fully_connected/;
>  redef Cluster::manager2logger_events += /fully_connected/;
>  
> +event bro_init()
> +	{
> +	Broker::auto_publish(Cluster::logger_topic, fully_connected);
> +	}
> +
> +event Broker::peer_added(endpoint: Broker::EndpointInfo, msg: string)
> +	{
> +	print "Connected to a peer";
> +	++peer_count;
> +
> +	if ( Cluster::node == "logger-1" )
> +		{
> +		if ( peer_count == 5 && fully_connected_nodes == 5 )
> +			{
> +			if ( Cluster::use_broker )
> +				terminate();
> +			else
> +				terminate_communication();
> +			}
> +		}
> +	else if ( Cluster::node == "manager-1" )
> +		{
> +		if ( peer_count == 5 )
> +			event fully_connected();
> +		}
> +	else
> +		{
> +		if ( peer_count == 3 )
> +			event fully_connected();
> +		}
> +	}
> +
> +event Broker::peer_lost(endpoint: Broker::EndpointInfo, msg: string)
> +	{
> +	terminate();
> +	}
> +
>  event remote_connection_handshake_done(p: event_peer)
>  	{
>  	print "Connected to a peer";
> diff --git a/testing/btest/scripts/base/frameworks/cluster/start-it-up.bro b/testing/btest/scripts/base/frameworks/cluster/start-it-up.bro
> index acb9c36..b0fcc69 100644
> --- a/testing/btest/scripts/base/frameworks/cluster/start-it-up.bro
> +++ b/testing/btest/scripts/base/frameworks/cluster/start-it-up.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run manager-1 BROPATH=$BROPATH:.. CLUSTER_NODE=manager-1 bro %INPUT
>  # @TEST-EXEC: sleep 1
> @@ -8,7 +8,7 @@
>  # @TEST-EXEC: btest-bg-run worker-1  BROPATH=$BROPATH:.. CLUSTER_NODE=worker-1 bro %INPUT
>  # @TEST-EXEC: btest-bg-run worker-2  BROPATH=$BROPATH:.. CLUSTER_NODE=worker-2 bro %INPUT
>  # @TEST-EXEC: btest-bg-wait 30
> -# @TEST-EXEC: btest-diff manager-1/.stdout
> +# @TEST-EXEC: TEST_DIFF_CANONIFIER=$SCRIPTS/diff-sort btest-diff manager-1/.stdout
>  # @TEST-EXEC: btest-diff proxy-1/.stdout
>  # @TEST-EXEC: btest-diff proxy-2/.stdout
>  # @TEST-EXEC: btest-diff worker-1/.stdout
> @@ -32,17 +32,59 @@ global fully_connected_nodes = 0;
>  
>  event fully_connected()
>  	{
> +	if ( ! is_remote_event() )
> +		return;
> +
> +	print "Got fully_connected event";
>  	fully_connected_nodes = fully_connected_nodes + 1;
> +
>  	if ( Cluster::node == "manager-1" )
>  		{
>  		if ( peer_count == 4 && fully_connected_nodes == 4 )
> -			terminate_communication();
> +			{
> +			if ( Cluster::use_broker )
> +				terminate();
> +			else
> +				terminate_communication();
> +			}
>  		}
>  	}
>  
>  redef Cluster::worker2manager_events += /fully_connected/;
>  redef Cluster::proxy2manager_events += /fully_connected/;
>  
> +event bro_init()
> +	{
> +	Broker::auto_publish(Cluster::manager_topic, fully_connected);
> +	}
> +
> +event Broker::peer_added(endpoint: Broker::EndpointInfo, msg: string)
> +	{
> +	print "Connected to a peer";
> +	peer_count = peer_count + 1;
> +
> +	if ( Cluster::node == "manager-1" )
> +		{
> +		if ( peer_count == 4 && fully_connected_nodes == 4 )
> +			{
> +			if ( Cluster::use_broker )
> +				terminate();
> +			else
> +				terminate_communication();
> +			}
> +		}
> +	else
> +		{
> +		if ( peer_count == 2 )
> +			event fully_connected();
> +		}
> +	}
> +
> +event Broker::peer_lost(endpoint: Broker::EndpointInfo, msg: string)
> +	{
> +	terminate();
> +	}
> +
>  event remote_connection_handshake_done(p: event_peer)
>  	{
>  	print "Connected to a peer";
> diff --git a/testing/btest/scripts/base/frameworks/control/configuration_update.bro b/testing/btest/scripts/base/frameworks/control/configuration_update.bro
> index 535a357..8d8fb0a 100644
> --- a/testing/btest/scripts/base/frameworks/control/configuration_update.bro
> +++ b/testing/btest/scripts/base/frameworks/control/configuration_update.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run controllee  BROPATH=$BROPATH:.. bro %INPUT frameworks/control/controllee Communication::listen_port=65531/tcp Broker::default_port=65531/tcp
>  # @TEST-EXEC: sleep 5
> diff --git a/testing/btest/scripts/base/frameworks/control/id_value.bro b/testing/btest/scripts/base/frameworks/control/id_value.bro
> index bee37a2..1b5e354 100644
> --- a/testing/btest/scripts/base/frameworks/control/id_value.bro
> +++ b/testing/btest/scripts/base/frameworks/control/id_value.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run controllee  BROPATH=$BROPATH:.. bro %INPUT only-for-controllee frameworks/control/controllee Communication::listen_port=65532/tcp Broker::default_port=65532/tcp
>  # @TEST-EXEC: btest-bg-run controller  BROPATH=$BROPATH:.. bro %INPUT frameworks/control/controller Control::host=127.0.0.1 Control::host_port=65532/tcp Control::cmd=id_value Control::arg=test_var
> diff --git a/testing/btest/scripts/base/frameworks/control/shutdown.bro b/testing/btest/scripts/base/frameworks/control/shutdown.bro
> index 9c0c104..2869c3f 100644
> --- a/testing/btest/scripts/base/frameworks/control/shutdown.bro
> +++ b/testing/btest/scripts/base/frameworks/control/shutdown.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run controllee BROPATH=$BROPATH:.. bro %INPUT frameworks/control/controllee Communication::listen_port=65530/tcp Broker::default_port=65530/tcp
>  # @TEST-EXEC: btest-bg-run controller BROPATH=$BROPATH:.. bro %INPUT frameworks/control/controller Control::host=127.0.0.1 Control::host_port=65530/tcp Control::cmd=shutdown
> diff --git a/testing/btest/scripts/base/frameworks/logging/field-extension-cluster-error.bro b/testing/btest/scripts/base/frameworks/logging/field-extension-cluster-error.bro
> index 6ac7a5e..c0ab10b 100644
> --- a/testing/btest/scripts/base/frameworks/logging/field-extension-cluster-error.bro
> +++ b/testing/btest/scripts/base/frameworks/logging/field-extension-cluster-error.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run manager-1 "cp ../cluster-layout.bro . && CLUSTER_NODE=manager-1 bro %INPUT"
>  # @TEST-EXEC: sleep 1
> @@ -43,6 +43,11 @@ event terminate_me() {
>  	terminate();
>  }
>  
> +event Broker::peer_lost(endpoint: Broker::EndpointInfo, msg: string)
> +	{
> +	schedule 1sec { terminate_me() };
> +	}
> +
>  event remote_connection_closed(p: event_peer) {
>    schedule 1sec { terminate_me() };
>  }
> diff --git a/testing/btest/scripts/base/frameworks/logging/field-extension-cluster.bro b/testing/btest/scripts/base/frameworks/logging/field-extension-cluster.bro
> index fb51251..6740743 100644
> --- a/testing/btest/scripts/base/frameworks/logging/field-extension-cluster.bro
> +++ b/testing/btest/scripts/base/frameworks/logging/field-extension-cluster.bro
> @@ -1,4 +1,4 @@
> -# @TEST-SERIALIZE: comm
> +# @TEST-SERIALIZE: brokercomm
>  #
>  # @TEST-EXEC: btest-bg-run manager-1 "cp ../cluster-layout.bro . && CLUSTER_NODE=manager-1 bro %INPUT"
>  # @TEST-EXEC: sleep 1
> @@ -39,6 +39,11 @@ event terminate_me() {
>  	terminate();
>  }
>  
> +event Broker::peer_lost(endpoint: Broker::EndpointInfo, msg: string)
> +	{
> +	schedule 1sec { terminate_me() };
> +	}
> +
>  event remote_connection_closed(p: event_peer) {
>    schedule 1sec { terminate_me() };
>  }
> 
> 
> 
> _______________________________________________
> bro-commits mailing list
> bro-commits at bro.org
> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-commits
> 


-- 
Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin

From jsiwek at illinois.edu  Tue Oct 31 15:35:23 2017
From: jsiwek at illinois.edu (Siwek, Jon)
Date: Tue, 31 Oct 2017 22:35:23 +0000
Subject: [Bro-Dev] [Bro-Commits] [git/bro] topic/actor-system:
 First-pass broker-enabled Cluster scripting API + misc. (07ad06b)
In-Reply-To: <20171031181607.GB26741@icir.org>
References: <201710271803.v9RI3oSQ001411@bro-ids.icir.org>
	<20171031181607.GB26741@icir.org>
Message-ID: <CA15300E-FC62-4062-8222-AFE33418BA6A@illinois.edu>


> On Oct 31, 2017, at 1:16 PM, Robin Sommer <robin at icir.org> wrote:
> 
>    - One thing I can't quite tell is if this is still aiming to
>      maintain compatibility with the old communication system, like
>      by keeping the proxies and also the *_events patterns. Looking
>      at setup-connections, it seems so. I'd say just go ahead and
>      remove all legacy pieces. Maintain two schemes in parallel is
>      cumbersome, and I think it's fine to just force everything over
>      to Broker.

It does keep the old functionality around if one does ?redef Cluster::use_broker=F", but the ?T? branch of the code doesn?t aim to maintain compatibility.  For the moment, I like having the old functionality around as reference: e.g. as I port other scripts/frameworks I may find it helpful to switch back to the old version to test/compare what it was doing.  I?ve made a note to remove it after I get everything working/stable.

The "use_broker=T? code branch does keep the notion of proxies the same (at least in the way the connect to other nodes in the cluster).  My thought was they can conceptually still be used for the same type of stuff: data sharing and offloading other misc. analysis/calculation.  And the only change to the setup I think I?d have to make is that each worker would now connect with all proxies instead of just one and proxies do not connect to each other.

I?ve also been talking with Justin and it seems like he wants the ability for there to be multiple logger nodes in a cluster w/ ability to distribute logs between then, which seems possible, but would need some API changes in Bro to get that working (e.g. change the static log topic to one returned by user-defined function).  I think he also had been expecting ?data? nodes to be a thing (not sure how those differ from proxies), so generally I?m worried I missed a previous discussion on what people expect the new cluster layout to look like or maybe just no one has put forth a coherent plan/design for that yet?

>    - Is the idea for the "*_topic" constants that one just picks the
>      apppropiate one when sending events? Like if I want to publish
>      something to all workers, I'd publish to Cluster::worker_topic?

Yeah, you have the idea right.

>      I think that's fine, though I'm wondering if we could compress
>      the API there somehow so that Cluster doesn't need to export all
>      those constants indvidiually. One idea would be a function that
>      returns a topic based on node type?

Yeah, could do that, but also don't really see the problem with exporting things individually.  At least that way, the topic strings are guaranteed to be correct in the generated docs.  With a function, you?d have to maintain the topic strings in two places: the docs and the internal function implementation, which may seem trivial to get right, but I?ve seen enough instances of outdated documentation that I have doubts...

>    - I like the Pools! How about moving Pool with its functions out
>      of the main.bro, just for clarity.

Sure.

>    - Looks like the hello/bye events are broadcasted by all nodes. Is
>      that on purpose, or should that be limited to just one, like
>      just the master sending them out? Or does it not matter and this
>      provides for more redundancy?

Mostly on purpose.  The point of the ?hello? message is to map broker node ID to cluster node name.  E.g. node IDs provided by Broker::peer_added are a hash of a MAC address concatenated w/ process ID (hard to read and associate with a cluster node) and node names are ?manager?, ?worker-1?, etc.  At the point where two nodes connect, I don?t think we have any other information other than node IDs and we need the node names to be able to send more directed messages, thus the broadcast.  At least I don?t think there?s another way to send directed messages (e.g. based on node ID) in Bro?s current API, maybe I missed it?

And the ?bye? event is only raised locally so users can potentially handle it to know when a cluster node goes down (i.e. it gives them the friendlier node name rather than the broker node ID that you?d get from handling Broker::peer_lost).

I might generally be missing some context here: I remember broker endpoints originally being able to self-identify with the friendly names, so these new hello/bye events wouldn?t have been needed, but it didn?t seem like that functionality was around anymore.

>    - create_store() vs "stores": Is the idea that I'd normally use
>      create_store() and that populates the table, but I could also
>      redef it myself instead of using create_store() to create more
>      custom entries? If so, maybe make that a bit more explicit in
>      the comments that there're two ways to configure that table.

That?s right, I?ll improve the docs.

- Jon