From balint.martina at gmail.com Fri Apr 7 03:11:23 2017 From: balint.martina at gmail.com (Martina Balintova) Date: Fri, 7 Apr 2017 11:11:23 +0100 Subject: [Bro-Dev] how to compile bro plugin at the same time as bro Message-ID: Hi, I would like to ask how to enable compilation and installation of bro's plugin at the same time as the rest of bro is compiling/installing. I would like to enable redis plugin (but only this one) - so I copied over BroPluginStatic.cmake into its cmake dir, but I am not sure how to call it from main CMake. For example - when I included into CMakeLists.txt (toplevel) "CheckOptionalBuildSources(aux/plugins/redis Redis true) " it was not finding eg plugin/Plugin.h or logging/WriterBacked.h I can somehow hack it in - in the way as dynamic cmake suggest, but I would like to know if there is neater way how to enable it. Thanks for any help on this, Maritna -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20170407/63e42f9a/attachment.html From robin at icir.org Mon Apr 10 10:53:50 2017 From: robin at icir.org (Robin Sommer) Date: Mon, 10 Apr 2017 10:53:50 -0700 Subject: [Bro-Dev] how to compile bro plugin at the same time as bro In-Reply-To: References: Message-ID: <20170410175350.GD68034@icir.org> On Fri, Apr 07, 2017 at 11:11 +0100, you wrote: > I would like to ask how to enable compilation and installation of bro's > plugin at the same time as the rest of bro is compiling/installing. That's a good question, we don't have a mechanism for that yet. Currently the assumption is that dynamic plugins are compiled separately. It would indeed be nice to have the Bro CMake configuration pick up further dynamic plugins to compile along. Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at illinois.edu Tue Apr 11 17:41:23 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Wed, 12 Apr 2017 00:41:23 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop Message-ID: I recently got a minimal CAF-based run loop for Bro working, did crude performance comparisons, and wanted to share. The approach was to measure average time between calls of net_packet_dispatch() and also the average time it takes to analyze a packet. The former attempts to measure the overhead imposed by the loop implementation and the later just gives an idea of how significant a chunk of time that is in relation to Bro?s main workload. I found that the overhead of the loop can be ~5-10% of the packet processing time, so it does seem worthwhile to try and keep the run loop overhead low. Initial testing of the CAF-based loop showed the overhead increased by ~1.8x, but there was still a major difference in the implementations: the standard Bro loop only invokes its IOSource polling mechanism (select) once every 25 cycles of the loop, while the CAF implementation?s polling mechanism (actor/thread scheduling + messaging + epoll) is used for every cycle/packet. As one would expect, by just trivially spinning the main process() function in a loop for 25 iterations, the overhead of the CAF-based loop comes back into line with the standard run loop. To try and better measure the actual differences related to the polling mechanism implementation, I quickly hacked Bro?s standard runloop to select() on every packet instead of once every 25th and found that the overhead measures +/- 10% within the 1.8x overhead increase of the initial CAF-based loop. So is the cost of the extra system call for epoll/select per packet the main thing to avoid? Sort of. I again hacked Bro?s standard loop to be able to use either epoll or poll instead of select and found that those do better, with the overhead increase being about 1.3x (still doing one ?poll? per packet) in relation to the standard run loop. Meaning there is some measurable trend in polling mechanism performance (for sparse # of FDs/sources): poll comes in first, epoll second, with CAF and select about tied for third. Takeaways: (1) Regardless of runloop implementation or polling mechanism choices, performing the polling operation once per packet should probably be avoided. In concept, it?s an easy way to get a 2-5% speedup in relation to total packet processing time. (2) Related to (1), but not in the sense of performance, is that even w/ a CAF-based loop it still seems somewhat difficult to reason about the reality of how IOSources are prioritized. In the standard loop, the priority of an IOSource is a combination of its ?idle? state, the polling frequency, and a timestamp, which it often chooses arbitrarily as the ?time of last packet?, just so that it gets processed with higher priority than subsequent packets. Maybe the topic of making IOSource prioritization more explicit/well-defined could be another thread of discussion, but my initial thought is that the whole IOSource abstraction may be over-generalized and maybe not even needed. (3) The performance overhead of a CAF-based loop doesn?t seem like a showstopper for proceeding with it as a choice for replacing the current loop. It?s not significantly worse than the current loop (provided we still throttle the polling ratio when packet sources are saturated), and even using the most minimal loop implementation of just poll() would only be about a 1% speedup in relation to the total packet processing workload. Just raw data below, for those interested: I tested against the pcaps from http://tcpreplay.appneta.com/wiki/captures.html (I was initially going to use tcpreplay to test performance against a live interface, but decided reading from a file is easier and just as good for what I wanted to measure). Numbers are measured in ?ticks?, which are equivalent to nanoseconds on the test system. Bro and CAF are both compiled w/ optimizations. bigFlows.pcap, 1 ?poll" per packet -------------------------- poll ('avg overhead', 1018.8868239999998) ('avg process', 11664.4968147) epoll ('avg overhead', 1114.2168096999999) ('avg process', 11680.6078816) CAF ('avg overhead', 1515.9933343999996) ('avg process', 11914.897109200003) select ('avg overhead', 1792.8142910999995) ('avg process', 11863.308550400001) bigFlows.pcap, Polling Throttled to 1 per 25 packets --------------------------- poll ('avg overhead', 772.6118347999999) ('avg process', 11504.2397625) epoll ('avg overhead', 814.4771509) ('avg process', 11547.058394900001) CAF ('avg overhead', 847.6571822) ('avg process', 11681.377972700002) select ('avg overhead', 855.2147494000001) ('avg process', 11585.1111236) smallFlows.pcap, 1 ?poll" per packet ---------------------------- poll ('avg overhead', 1403.8950280800004) ('avg process', 22202.960570839998) epoll ('avg overhead', 1470.0554376) ('avg process', 22210.3240474) select ('avg overhead', 2305.6278429200006) ('avg process', 22549.29251384) CAF ('avg overhead', 2405.1401093399995) ('avg process', 23401.66596454) smallFlows.pcap, Polling Throttled to 1 per 25 packets ----------------------------- poll ('avg overhead', 1156.0900352) ('avg process', 22113.8645395) epoll ('avg overhead', 1192.37176) ('avg process', 22000.2246757) select ('avg overhead', 1269.0761219) ('avg process', 22017.891367999997) CAF ('avg overhead', 1441.6064868) ('avg process', 22658.534969599998) From slagell at illinois.edu Wed Apr 12 11:35:38 2017 From: slagell at illinois.edu (Slagell, Adam J) Date: Wed, 12 Apr 2017 18:35:38 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: References: Message-ID: Justin asked an interesting question today, how does this affect performance on the manager? That is where we are feeling a lot of pain with select(). On Apr 11, 2017, at 7:41 PM, Siwek, Jon > wrote: I recently got a minimal CAF-based run loop for Bro working, did crude performance comparisons, and wanted to share. The approach was to measure average time between calls of net_packet_dispatch() and also the average time it takes to analyze a packet. The former attempts to measure the overhead imposed by the loop implementation and the later just gives an idea of how significant a chunk of time that is in relation to Bro?s main workload. I found that the overhead of the loop can be ~5-10% of the packet processing time, so it does seem worthwhile to try and keep the run loop overhead low. Initial testing of the CAF-based loop showed the overhead increased by ~1.8x, but there was still a major difference in the implementations: the standard Bro loop only invokes its IOSource polling mechanism (select) once every 25 cycles of the loop, while the CAF implementation?s polling mechanism (actor/thread scheduling + messaging + epoll) is used for every cycle/packet. As one would expect, by just trivially spinning the main process() function in a loop for 25 iterations, the overhead of the CAF-based loop comes back into line with the standard run loop. To try and better measure the actual differences related to the polling mechanism implementation, I quickly hacked Bro?s standard runloop to select() on every packet instead of once every 25th and found that the overhead measures +/- 10% within the 1.8x overhead increase of the initial CAF-based loop. So is the cost of the extra system call for epoll/select per packet the main thing to avoid? Sort of. I again hacked Bro?s standard loop to be able to use either epoll or poll instead of select and found that those do better, with the overhead increase being about 1.3x (still doing one ?poll? per packet) in relation to the standard run loop. Meaning there is some measurable trend in polling mechanism performance (for sparse # of FDs/sources): poll comes in first, epoll second, with CAF and select about tied for third. Takeaways: (1) Regardless of runloop implementation or polling mechanism choices, performing the polling operation once per packet should probably be avoided. In concept, it?s an easy way to get a 2-5% speedup in relation to total packet processing time. (2) Related to (1), but not in the sense of performance, is that even w/ a CAF-based loop it still seems somewhat difficult to reason about the reality of how IOSources are prioritized. In the standard loop, the priority of an IOSource is a combination of its ?idle? state, the polling frequency, and a timestamp, which it often chooses arbitrarily as the ?time of last packet?, just so that it gets processed with higher priority than subsequent packets. Maybe the topic of making IOSource prioritization more explicit/well-defined could be another thread of discussion, but my initial thought is that the whole IOSource abstraction may be over-generalized and maybe not even needed. (3) The performance overhead of a CAF-based loop doesn?t seem like a showstopper for proceeding with it as a choice for replacing the current loop. It?s not significantly worse than the current loop (provided we still throttle the polling ratio when packet sources are saturated), and even using the most minimal loop implementation of just poll() would only be about a 1% speedup in relation to the total packet processing workload. Just raw data below, for those interested: I tested against the pcaps from http://tcpreplay.appneta.com/wiki/captures.html (I was initially going to use tcpreplay to test performance against a live interface, but decided reading from a file is easier and just as good for what I wanted to measure). Numbers are measured in ?ticks?, which are equivalent to nanoseconds on the test system. Bro and CAF are both compiled w/ optimizations. bigFlows.pcap, 1 ?poll" per packet -------------------------- poll ('avg overhead', 1018.8868239999998) ('avg process', 11664.4968147) epoll ('avg overhead', 1114.2168096999999) ('avg process', 11680.6078816) CAF ('avg overhead', 1515.9933343999996) ('avg process', 11914.897109200003) select ('avg overhead', 1792.8142910999995) ('avg process', 11863.308550400001) bigFlows.pcap, Polling Throttled to 1 per 25 packets --------------------------- poll ('avg overhead', 772.6118347999999) ('avg process', 11504.2397625) epoll ('avg overhead', 814.4771509) ('avg process', 11547.058394900001) CAF ('avg overhead', 847.6571822) ('avg process', 11681.377972700002) select ('avg overhead', 855.2147494000001) ('avg process', 11585.1111236) smallFlows.pcap, 1 ?poll" per packet ---------------------------- poll ('avg overhead', 1403.8950280800004) ('avg process', 22202.960570839998) epoll ('avg overhead', 1470.0554376) ('avg process', 22210.3240474) select ('avg overhead', 2305.6278429200006) ('avg process', 22549.29251384) CAF ('avg overhead', 2405.1401093399995) ('avg process', 23401.66596454) smallFlows.pcap, Polling Throttled to 1 per 25 packets ----------------------------- poll ('avg overhead', 1156.0900352) ('avg process', 22113.8645395) epoll ('avg overhead', 1192.37176) ('avg process', 22000.2246757) select ('avg overhead', 1269.0761219) ('avg process', 22017.891367999997) CAF ('avg overhead', 1441.6064868) ('avg process', 22658.534969599998) _______________________________________________ bro-dev mailing list bro-dev at bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev ------ Adam J. Slagell Director, Cybersecurity & Networking Division Chief Information Security Officer National Center for Supercomputing Applications University of Illinois at Urbana-Champaign www.slagell.info "Under the Illinois Freedom of Information Act (FOIA), any written communication to or from University employees regarding University business is a public record and may be subject to public disclosure." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20170412/12f572aa/attachment-0001.html From jsiwek at illinois.edu Wed Apr 12 19:05:52 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Thu, 13 Apr 2017 02:05:52 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: References: Message-ID: <89FB2E77-975F-474C-A601-22E38B00395E@illinois.edu> > On Apr 12, 2017, at 1:35 PM, Slagell, Adam J wrote: > > Justin asked an interesting question today, how does this affect performance on the manager? That is where we are feeling a lot of pain with select(). If you mean the select() that?s in the process fork?d by the old RemoteSerializer code, you?d still see the same problems with the CAF-based runloop. But that code is irrelevant once Broker takes its place. i.e. to answer that question, you need to design a communication stress test using Broker-based Bros as that?s more relevant than just changing the main loop. Eventually, I can also imagine the Broker-based communication being more tightly integrated into the CAF-based runloop helping improve performance over the current Broker integration method. Either way, what needs to be measured is how CAF?s multiplexer performs in relation to Bro?s communication patterns, but maybe still want to wait for the Broker improvements to wrap up before looking into doing those tests. In the near-term, I can make a totally separate code branch that simply replaces select() with epoll. Then, if Justin were to test it and find it alleviates performance pains on the manager, it could potentially get merged into bro/master ahead of the any of the pending broker/caf/runloop projects since it should be a trivial and safe change to do. Let me know. - Jon From gc355804 at ohio.edu Wed Apr 12 20:19:02 2017 From: gc355804 at ohio.edu (Clark, Gilbert) Date: Thu, 13 Apr 2017 03:19:02 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: References: Message-ID: $0.02 USD: As I recall, Bro's per-packet processing overhead can vary significantly as a result of timers and triggers that execute on a situational basis. Also, relative overhead of packet ingest is going to vary based on the set of loaded scripts in addition to the specific trace used to run the tests. That's not trying to argue that these results are not useful / interesting, but instead *only* that the specific percentages might not be representative of the general case (just because I'm convinced that there really is not a general case to objectively measure). Also ... if the overhead of the polling / ingest itself turns out to be a huge problem at high rates, one idea would be to separate that and pass packets (in bulk) through a ring / high-speed IPC to the process that needs to ingest them. That's worked pretty well for me in DPDK, and has the benefit of being able to distribute packets from one ingest to multiple processors (which is something I've had to do for process-heavy workloads ... which I would argue is something that Bro tends to be). Along those lines, rather than spending much time on packet ingest mechanics in bro (or pieces thereof), one idea might be to instead focus on integrating packet bricks as a standard ingest / distribution mechanic for everything packet-related in the general case. The idea would be that fetching packets from bro (and its related processes) would become less about calls to epoll and select, and more about high-speed IPC that went out of its way to avoid kernel-space entirely. The nice thing about that is that it'd be a little easier to standardize on the bro side of things, and would take a step toward separating bro as a scripting / event engine from bro as a (relative) monolith. Of course, the down side is that packet bricks could add some serious (mandatory) complexity to bro, so maybe it's not the right answer ... but maybe a more lightweight, specialized distribution channel might be doable, or maybe there would be a way to embed packet bricks inside of an application in the event that folks didn't want to run the two separately, or ... etc. As always, just for what it's worth :) -Gilbert ________________________________ From: bro-dev-bounces at bro.org on behalf of Siwek, Jon Sent: Tuesday, April 11, 2017 8:41:23 PM To: Subject: [Bro-Dev] early performance comparisons of CAF-based run loop I recently got a minimal CAF-based run loop for Bro working, did crude performance comparisons, and wanted to share. The approach was to measure average time between calls of net_packet_dispatch() and also the average time it takes to analyze a packet. The former attempts to measure the overhead imposed by the loop implementation and the later just gives an idea of how significant a chunk of time that is in relation to Bro?s main workload. I found that the overhead of the loop can be ~5-10% of the packet processing time, so it does seem worthwhile to try and keep the run loop overhead low. Initial testing of the CAF-based loop showed the overhead increased by ~1.8x, but there was still a major difference in the implementations: the standard Bro loop only invokes its IOSource polling mechanism (select) once every 25 cycles of the loop, while the CAF implementation?s polling mechanism (actor/thread scheduling + messaging + epoll) is used for every cycle/packet. As one would expect, by just trivially spinning the main process() function in a loop for 25 iterations, the overhead of the CAF-based loop comes back into line with the standard run loop. To try and better measure the actual differences related to the polling mechanism implementation, I quickly hacked Bro?s standard runloop to select() on every packet instead of once every 25th and found that the overhead measures +/- 10% within the 1.8x overhead increase of the initial CAF-based loop. So is the cost of the extra system call for epoll/select per packet the main thing to avoid? Sort of. I again hacked Bro?s standard loop to be able to use either epoll or poll instead of select and found that those do better, with the overhead increase being about 1.3x (still doing one ?poll? per packet) in relation to the standard run loop. Meaning there is some measurable trend in polling mechanism performance (for sparse # of FDs/sources): poll comes in first, epoll second, with CAF and select about tied for third. Takeaways: (1) Regardless of runloop implementation or polling mechanism choices, performing the polling operation once per packet should probably be avoided. In concept, it?s an easy way to get a 2-5% speedup in relation to total packet processing time. (2) Related to (1), but not in the sense of performance, is that even w/ a CAF-based loop it still seems somewhat difficult to reason about the reality of how IOSources are prioritized. In the standard loop, the priority of an IOSource is a combination of its ?idle? state, the polling frequency, and a timestamp, which it often chooses arbitrarily as the ?time of last packet?, just so that it gets processed with higher priority than subsequent packets. Maybe the topic of making IOSource prioritization more explicit/well-defined could be another thread of discussion, but my initial thought is that the whole IOSource abstraction may be over-generalized and maybe not even needed. (3) The performance overhead of a CAF-based loop doesn?t seem like a showstopper for proceeding with it as a choice for replacing the current loop. It?s not significantly worse than the current loop (provided we still throttle the polling ratio when packet sources are saturated), and even using the most minimal loop implementation of just poll() would only be about a 1% speedup in relation to the total packet processing workload. Just raw data below, for those interested: I tested against the pcaps from http://tcpreplay.appneta.com/wiki/captures.html (I was initially going to use tcpreplay to test performance against a live interface, but decided reading from a file is easier and just as good for what I wanted to measure). Numbers are measured in ?ticks?, which are equivalent to nanoseconds on the test system. Bro and CAF are both compiled w/ optimizations. bigFlows.pcap, 1 ?poll" per packet -------------------------- poll ('avg overhead', 1018.8868239999998) ('avg process', 11664.4968147) epoll ('avg overhead', 1114.2168096999999) ('avg process', 11680.6078816) CAF ('avg overhead', 1515.9933343999996) ('avg process', 11914.897109200003) select ('avg overhead', 1792.8142910999995) ('avg process', 11863.308550400001) bigFlows.pcap, Polling Throttled to 1 per 25 packets --------------------------- poll ('avg overhead', 772.6118347999999) ('avg process', 11504.2397625) epoll ('avg overhead', 814.4771509) ('avg process', 11547.058394900001) CAF ('avg overhead', 847.6571822) ('avg process', 11681.377972700002) select ('avg overhead', 855.2147494000001) ('avg process', 11585.1111236) smallFlows.pcap, 1 ?poll" per packet ---------------------------- poll ('avg overhead', 1403.8950280800004) ('avg process', 22202.960570839998) epoll ('avg overhead', 1470.0554376) ('avg process', 22210.3240474) select ('avg overhead', 2305.6278429200006) ('avg process', 22549.29251384) CAF ('avg overhead', 2405.1401093399995) ('avg process', 23401.66596454) smallFlows.pcap, Polling Throttled to 1 per 25 packets ----------------------------- poll ('avg overhead', 1156.0900352) ('avg process', 22113.8645395) epoll ('avg overhead', 1192.37176) ('avg process', 22000.2246757) select ('avg overhead', 1269.0761219) ('avg process', 22017.891367999997) CAF ('avg overhead', 1441.6064868) ('avg process', 22658.534969599998) _______________________________________________ bro-dev mailing list bro-dev at bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20170413/36b8796a/attachment.html From jazoff at illinois.edu Wed Apr 12 20:19:47 2017 From: jazoff at illinois.edu (Azoff, Justin S) Date: Thu, 13 Apr 2017 03:19:47 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <89FB2E77-975F-474C-A601-22E38B00395E@illinois.edu> References: <89FB2E77-975F-474C-A601-22E38B00395E@illinois.edu> Message-ID: <435E2532-1F61-46BA-BA56-2CC9C91AAAD2@illinois.edu> > On Apr 12, 2017, at 9:05 PM, Siwek, Jon wrote: > > >> On Apr 12, 2017, at 1:35 PM, Slagell, Adam J wrote: >> >> Justin asked an interesting question today, how does this affect performance on the manager? That is where we are feeling a lot of pain with select(). > > If you mean the select() that?s in the process fork?d by the old RemoteSerializer code, you?d still see the same problems with the CAF-based runloop. But that code is irrelevant once Broker takes its place. i.e. to answer that question, you need to design a communication stress test using Broker-based Bros as that?s more relevant than just changing the main loop. Yep, that select stuff. My question was mostly about the different workloads in a bro cluster. Something that may be optimized for a worker dealing with 1 pktsrc and 2 peers may not be as optimal for a logger/manager that has no pktsrc but 100+ worker connections. I've often wondered if the event loop should have a hint somewhere about which kind of process is running so it can optimize for throughput vs multiplexing many peers. > Eventually, I can also imagine the Broker-based communication being more tightly integrated into the CAF-based runloop helping improve performance over the current Broker integration method. Either way, what needs to be measured is how CAF?s multiplexer performs in relation to Bro?s communication patterns, but maybe still want to wait for the Broker improvements to wrap up before looking into doing those tests. > > In the near-term, I can make a totally separate code branch that simply replaces select() with epoll. Then, if Justin were to test it and find it alleviates performance pains on the manager, it could potentially get merged into bro/master ahead of the any of the pending broker/caf/runloop projects since it should be a trivial and safe change to do. Let me know. Ah.. I had actually started trying to do that a long time ago, but gave up because broker was going to replace all of that code anyway. https://github.com/bro/bro/commits/topic/jazoff/select-to-poll from what I recall the first commit seemed to work but the second broke something. The thing that always stood out to me was that the manager would run select across all the worker sockets, and then loop over each worker and run CanRead, which just ran select again on each individual FD. One issue a few people have run into on the manager is that select returns EINVAL and deadlocks bro if you give it a FD larger than 1024, which you currently hit on around a 200 node cluster (socket + flares use 4 or 5 FDs per worker). -- - Justin Azoff From slagell at illinois.edu Thu Apr 13 09:29:37 2017 From: slagell at illinois.edu (Slagell, Adam J) Date: Thu, 13 Apr 2017 16:29:37 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <89FB2E77-975F-474C-A601-22E38B00395E@illinois.edu> References: <89FB2E77-975F-474C-A601-22E38B00395E@illinois.edu> Message-ID: <432585A7-C21C-4A90-B7F1-CF3C886AC093@illinois.edu> That might be useful. I would like Robin?s thoughts, too. On Apr 12, 2017, at 9:05 PM, Siwek, Jon > wrote: In the near-term, I can make a totally separate code branch that simply replaces select() with epoll. Then, if Justin were to test it and find it alleviates performance pains on the manager, it could potentially get merged into bro/master ahead of the any of the pending broker/caf/runloop projects since it should be a trivial and safe change to do. Let me know. ------ Adam J. Slagell Director, Cybersecurity & Networking Division Chief Information Security Officer National Center for Supercomputing Applications University of Illinois at Urbana-Champaign www.slagell.info "Under the Illinois Freedom of Information Act (FOIA), any written communication to or from University employees regarding University business is a public record and may be subject to public disclosure." -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20170413/4ca0cb00/attachment.html From jsiwek at illinois.edu Thu Apr 13 12:52:38 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Thu, 13 Apr 2017 19:52:38 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: References: Message-ID: <07A083F9-BC3C-47F0-A79D-C60C8E9436C7@illinois.edu> > On Apr 12, 2017, at 10:19 PM, Clark, Gilbert wrote: > > Also, relative overhead of packet ingest is going to vary based on the set of loaded scripts in addition to the specific trace used to run the tests. That's not trying to argue that these results are not useful / interesting, but instead *only* that the specific percentages might not be representative of the general case (just because I'm convinced that there really is not a general case to objectively measure). I agree, the specific numbers here aren?t generalizable, but I think that?s ok and we can still infer that the different runloop implementation doesn?t raise any obvious performance concern. That being due to (1) with the tests using the default set of Bro scripts, I?d expect it to be more common for users to have more complicated scripts and highly customized deployments such that the relative overhead decreases further and becomes more irrelevant than the tests show and (2) even if the specific pcaps tested were at either end of the spectrum in terms of how much work is required to process them, it still shows that the relative overhead differences are minimal. i.e. I think we?d only be in trouble interpreting the results if the tests showed a significant relative overhead difference. Since then we don?t know if the given pcaps where just ?easy? ones for Bro to process. - Jon From jsiwek at illinois.edu Thu Apr 13 13:12:48 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Thu, 13 Apr 2017 20:12:48 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <435E2532-1F61-46BA-BA56-2CC9C91AAAD2@illinois.edu> References: <89FB2E77-975F-474C-A601-22E38B00395E@illinois.edu> <435E2532-1F61-46BA-BA56-2CC9C91AAAD2@illinois.edu> Message-ID: > On Apr 12, 2017, at 10:19 PM, Azoff, Justin S wrote: > > Something that may be optimized for a worker dealing with 1 pktsrc and 2 peers may not be as optimal for a logger/manager that has no pktsrc but 100+ worker connections. I've often wondered if the event loop should have a hint somewhere about which kind of process is running so it can optimize for throughput vs multiplexing many peers. Yeah, I?ve thought the same and related with takeaway (2) that I mention in the original post. It seems like it would be nice to have a more well-defined system for specifying IOSource prioritization or at least between packet sources and other io sources. Then, since it?s hard to nail down settings that are going to work for all deployments in general, it would also have ways to tune it via scripts so it would be open for a user to tweak settings that may improve for their particular manager, logger, worker, or whatever Bro node they have. But at the moment, I don?t think there?s a whole lot of info on exactly what tweaks can be made to optimize for the ?no pktsrc, but lots of remote comm.? case, and it may be best to wait for the broker integration to become fully realized before investigating that. - Jon From robin at icir.org Fri Apr 14 06:32:11 2017 From: robin at icir.org (Robin Sommer) Date: Fri, 14 Apr 2017 15:32:11 +0200 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: References: Message-ID: <20170414133211.GA57749@icir.org> Nice, thanks for the doing these measurements! I haven't looked at the code yet, but some quick thoughts on your results and some of the other comments this thread, and then some suggested next steps at the end. - Agree that overall your numbers suggest that all these mechanisms are fine performancewise, assuming we keep the optimization to batch packets between polls/selects to avoid the one-system-call-per-packet overhead. - I don't think we should spend time anymore on improving the old communication code. We're getting close to retire that now and a number of its issues (like selects in the child process) will just go away with that. Let's focus on the new setting where Broker/CAF will be doing all communication. - Regarding optimizing for different use cases: I would prefer avoiding having lots of knobs to configure the specifics of the loop. We have these magic values in the current I/O loop where nobody knows how to pick them because it's hard to understand their impact; and where folks have played with them, it was always hard conclude much about them beyond any specific setting. What we could try instead is a loop that adjusts itself based on load patterns: if the load is heavy on packets, build larger batches to process between polls; if input comes from lots different sources, increase the polling; etc. Any heuristic here would need to stay quite simple (otherwise we'd again end up not being able to predict much), but I think that'd be worth a try. - Gilbert's point on high-performance IPC is a good one. I don't think we want to switch to direct memory access as our main model for the time being at least, but it does pose the question if/how can integrate packet sources that either don't need or don't support select/poll. (Which, in a nod to history, accounts for some of the complexities of the current loop because many years ago some pcaps didn't support select) In terms of next steps, we need to see if these results hold across different OSs, and also with live traffic. The two questions are (1) does the new loop function on all platforms with both low- and high-volume live traffic (presumably it will but that needs double checking, given the history of weird OS-specific effects); and (2) does performance match the measurements shown so far? If we can confirm that on at least Linux and FreeBSD for, say, the two most recent major releases of each and also consider common alternative capturing solutions (pfring, netmap, afnet?), I'd be pretty comfortable switching. Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From vallentin at icir.org Fri Apr 14 07:56:09 2017 From: vallentin at icir.org (Matthias Vallentin) Date: Fri, 14 Apr 2017 07:56:09 -0700 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <20170414133211.GA57749@icir.org> References: <20170414133211.GA57749@icir.org> Message-ID: <20170414145609.GH4132@shogun.local> > If we can confirm that on at least Linux and FreeBSD for, say, the two > most recent major releases of each and also consider common > alternative capturing solutions (pfring, netmap, afnet?), I'd be > pretty comfortable switching. Just a quick comment here regarding FreeBSD: the native polling mechanism is kqueue, and CAF still lacks support for it [1]. Fortunately, this is a rather straight-forward task. Matthias [1] https://github.com/actor-framework/actor-framework/issues/533 From jsiwek at illinois.edu Fri Apr 14 10:32:35 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Fri, 14 Apr 2017 17:32:35 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <20170414133211.GA57749@icir.org> References: <20170414133211.GA57749@icir.org> Message-ID: <5E74D8F0-0E2B-4714-939F-A50CD013625F@illinois.edu> > On Apr 14, 2017, at 8:32 AM, Robin Sommer wrote: > > - I don't think we should spend time anymore on improving the old > communication code. We're getting close to retire that now and a > number of its issues (like selects in the child process) will just > go away with that. Let's focus on the new setting where Broker/CAF > will be doing all communication. If people are hitting the 1024 FD hard-limit in the old comm. code?s select(), that would indeed go away with the change to Broker. But I think the way Broker is integrated in the parent?s main loop still relies on a select(), with the number of FDs it monitors scaling with the number of peers. i.e. there may still be critical errors w/ large Bro clusters even using Broker as the communication system, just this time the problem manifests in the main loop. Just mentioning it in case you didn?t account for the real fix also requiring the CAF-based loop being fully realized in addition to Broker ? I?m less certain about the timeline of finishing up the CAF-based loop compared to just patching in a temporary stopgap of patching out the select() calls. (Also don?t have a sense of the frequency/urgency of the problem). > - Regarding optimizing for different use cases: I would prefer > avoiding having lots of knobs to configure the specifics of the > loop. We have these magic values in the current I/O loop where > nobody knows how to pick them because it's hard to understand their > impact; and where folks have played with them, it was always hard > conclude much about them beyond any specific setting. What we could > try instead is a loop that adjusts itself based on load patterns: if > the load is heavy on packets, build larger batches to process > between polls; if input comes from lots different sources, increase > the polling; etc. That seems like a Good Idea. > it does pose the question if/how can > integrate packet sources that either don't need or don't support > select/poll I think that?s just a matter of making sure the main loop ?spins? at an appropriate frequency, which might change dynamically, dependent on loading pattern optimizations, as per the above idea. Maybe you could even think of reading an offline pcap file as a source that doesn?t need select/poll. Pedantically, regular files also don't ?support? select(), at least not w/ the same intention (nonblocking IO), but it just happens to work fine in the current runloop implementation. So since I?ve been able to get the CAF-based loop working on offline pcap files (it does not rely on polling the FD of the open file since it didn't work anyway w/ CAF's epoll-based multiplexer on Linux), it may be fair to say that other packet sources that don?t require/support poll-ability should also be possible to integrate. - Jon From seth at icir.org Mon Apr 17 07:34:14 2017 From: seth at icir.org (Seth Hall) Date: Mon, 17 Apr 2017 10:34:14 -0400 Subject: [Bro-Dev] ConfigurePackaging in plugins? Message-ID: <5D168659-1F73-44AB-898D-5F99ACFFF84B@icir.org> I'm casting around for thoughts on adding a mechanism to add the ConfigurePackaging cmake packaging mechanism to plugins without having to replicate the cmake script in the main cmake repository or making near-clones of it. Is there some way we could use that script from the main Bro repository without needing to include it with the plugin? .Seth From asharma at lbl.gov Tue Apr 18 10:29:50 2017 From: asharma at lbl.gov (Aashish Sharma) Date: Tue, 18 Apr 2017 10:29:50 -0700 Subject: [Bro-Dev] CMU/SEI C++ secure coding best practices Message-ID: Anyone seen this out of CMU: SEI CERT C++ Coding Standard Rules for Developing Safe, Reliable, and Secure Systems in C++ http://cert.org/downloads/secure-coding/assets/sei-cert-cpp-coding-standard-2016-v01.pdf Not sure how good/bad/awesome/relevant this is. Aashish From robin at icir.org Wed Apr 19 00:55:30 2017 From: robin at icir.org (Robin Sommer) Date: Wed, 19 Apr 2017 09:55:30 +0200 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <20170414145609.GH4132@shogun.local> References: <20170414133211.GA57749@icir.org> <20170414145609.GH4132@shogun.local> Message-ID: <20170419075530.GD93237@icir.org> On Fri, Apr 14, 2017 at 07:56 -0700, you wrote: > Just a quick comment here regarding FreeBSD: the native polling > mechanism is kqueue, and CAF still lacks support for it [1]. > Fortunately, this is a rather straight-forward task. Oh, sounds like that would be high-priority task then before we'd consider moving to a CAF-based loop? Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at illinois.edu Wed Apr 19 08:24:08 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Wed, 19 Apr 2017 15:24:08 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <20170419075530.GD93237@icir.org> References: <20170414133211.GA57749@icir.org> <20170414145609.GH4132@shogun.local> <20170419075530.GD93237@icir.org> Message-ID: > On Apr 19, 2017, at 2:55 AM, Robin Sommer wrote: > >> Just a quick comment here regarding FreeBSD: the native polling >> mechanism is kqueue, and CAF still lacks support for it [1]. >> Fortunately, this is a rather straight-forward task. > > Oh, sounds like that would be high-priority task then before we'd > consider moving to a CAF-based loop? It still falls back to ?poll? on non-Linux. If more performance tests are to be done in realistic conditions (live traffic + cluster communication) on various platforms, you?d likely find out at that point whether it?s a high-priority task. I expect ?poll? might still be usable if the common case is only going to max out around the order of 100s of peers. - Jon From vallentin at icir.org Wed Apr 19 13:40:53 2017 From: vallentin at icir.org (Matthias Vallentin) Date: Wed, 19 Apr 2017 13:40:53 -0700 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <20170419075530.GD93237@icir.org> References: <20170414133211.GA57749@icir.org> <20170414145609.GH4132@shogun.local> <20170419075530.GD93237@icir.org> Message-ID: > Oh, sounds like that would be high-priority task then before we'd > consider moving to a CAF-based loop? I've added kqueue support in topic/kqueue, but it's still missing the final touch. (Unit tests are still failing.) Hopefully it's not a long way from here. Matthias From robin at icir.org Thu Apr 20 10:40:36 2017 From: robin at icir.org (Robin Sommer) Date: Thu, 20 Apr 2017 10:40:36 -0700 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <5E74D8F0-0E2B-4714-939F-A50CD013625F@illinois.edu> References: <20170414133211.GA57749@icir.org> <5E74D8F0-0E2B-4714-939F-A50CD013625F@illinois.edu> Message-ID: <20170420174036.GJ3481@icir.org> On Fri, Apr 14, 2017 at 17:32 +0000, you wrote: > Just mentioning it in case you didn?t account for the real fix also > requiring the CAF-based loop being fully realized in addition to > Broker Yeah, true, I was thinking that eventually we will have this all solved. > (Also don?t have a sense of the frequency/urgency of the problem). I think that's the main question. So far I haven't gotten the sense that this really affects a lot of people, so I see the priority as rather low given our limited cycles for development and testing. If it's a more pressing problem, we can reconsider of course. > So since I?ve been able to get the CAF-based loop working on offline > pcap files (it does not rely on polling the FD of the open file since > it didn't work anyway w/ CAF's epoll-based multiplexer on Linux), it > may be fair to say that other packet sources that don?t > require/support poll-ability should also be possible to integrate. I need to think about that argument ... Did you try reading from files while also doing communication (that would be pseudo-realtime mode), or was the pcap the only source of input? Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at illinois.edu Thu Apr 20 20:13:37 2017 From: jsiwek at illinois.edu (Siwek, Jon) Date: Fri, 21 Apr 2017 03:13:37 +0000 Subject: [Bro-Dev] early performance comparisons of CAF-based run loop In-Reply-To: <20170420174036.GJ3481@icir.org> References: <20170414133211.GA57749@icir.org> <5E74D8F0-0E2B-4714-939F-A50CD013625F@illinois.edu> <20170420174036.GJ3481@icir.org> Message-ID: > On Apr 20, 2017, at 12:40 PM, Robin Sommer wrote: > > I need to think about that argument ... Did you try reading from files > while also doing communication (that would be pseudo-realtime mode), > or was the pcap the only source of input? Tested: * pcap * pcap + script doing DNS queries * live interface * live interface + script doing DNS queries Untested: * anything with remote communication * pseudo-realtime - Jon From justin.oursler at gmail.com Wed Apr 26 13:17:23 2017 From: justin.oursler at gmail.com (Justin Oursler) Date: Wed, 26 Apr 2017 16:17:23 -0400 Subject: [Bro-Dev] TCP Reassembly Message-ID: Hello, I am writing a TCP application analyzer and depend on packet order to build a full PDU over many TCP packets. Occasionally I will receive a packet out of order in my analyzer's DeliverStream function. Is there a way to assure I am getting packets in order? Or, any advice on debugging the reassembly? Thank you, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20170426/42c538e7/attachment.html From asharma at lbl.gov Fri Apr 28 17:55:49 2017 From: asharma at lbl.gov (Aashish Sharma) Date: Fri, 28 Apr 2017 17:55:49 -0700 Subject: [Bro-Dev] can I send an opaque of bloomfilter over Cluster::manager2worker_event ? Message-ID: <20170429005548.GT10784@mac-822.local> I tried doing that and then merging with an existing (initialized) bloomfilter on worker. I see this error: 1493427133.170419 Reporter::INFO calling inside the m_w_add_bloom worker-1 - 1493427133.170419 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) - 1493427133.170419 Reporter::ERROR failed to merge Bloom filter (empty) - 1493427115.582247 Reporter::INFO calling inside the m_w_add_bloom worker-6 - 1493427115.582247 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) - 1493427115.582247 Reporter::ERROR failed to merge Bloom filter (empty) - 1493427116.358858 Reporter::INFO calling inside the m_w_add_bloom worker-20 - 1493427116.358858 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) - 1493427116.358858 Reporter::ERROR failed to merge Bloom filter (empty) - 1493427115.935649 Reporter::INFO calling inside the m_w_add_bloom worker-7 - 1493427115.935649 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) - 1493427115.935649 Reporter::ERROR failed to merge Bloom filter (empty) - 1493427115.686241 Reporter::INFO calling inside the m_w_add_bloom worker-16 - 1493427115.686241 Reporter::ERROR incompatible hashers in BasicBloomFilter merge (empty) - 1493427115.686241 Reporter::ERROR failed to merge Bloom filter (empty) - 14934271 Not sure if the error is because an opaque of bloomfilter cannot be sent over worker2manager_events and manager2worker_events or if I am doing something not quite right. Aashish