From mfischer at ICSI.Berkeley.EDU Fri Jul 1 09:02:35 2016 From: mfischer at ICSI.Berkeley.EDU (Mathias Fischer) Date: Fri, 1 Jul 2016 18:02:35 +0200 Subject: [Bro-Dev] Some thoughts on the bro deep cluster, broker, and sumstats Message-ID: <5776941B.8090608@icsi.berkeley.edu> *tl;dr* I continued my work on the bro deep cluster in the last months and just want to share my outcome so far and future plans with you: 1. I want to get your opinion on my broker enhancement that allows to route messages in between not directly connected peers (given that there is a path between them). 2. I want to share some preliminary thoughts on how to enhance the sumstats framework to a deep cluster setting, so that it is possible to create (multiple) subgroups of bros (dynamically) in the deep cluster that can share and aggregate information. Criticism, opinions, and further suggestions are very welcome! Best, Mathias -------------------- Summary deep cluster -------------------- A deep cluster provides one administrative interface for several conventional clusters and/or standalone Bro nodes at once. A deep cluster eases the monitoring of several links at once and can interconnect different Bros and different Bro clusters in different places. Due to its better scalability it can bring monitoring from the edge of the monitored network into its depth (-> deep cluster). Moreover, it enables and facilitates information exchange in between different Bros and different Bro clusters. In essence, a deep cluster can be seen as a P2P overlay network of Bro nodes, so that all Bros can communicate with each other. In summary, my outcome so far towards building such a deep-cluster is the following * a modified multi-hop broker that allows to forward content in between peers that are only connected indirectly with each other. * some bro modifications in code and foremost in bro script land * an enhanced broctl that operates as daemon and that can initiate connections to other such daemons (all communication based on broker), including a json-based configuration of nodes and connections A summary of all changes can be found on the bro website (including instruction on how to run the current version of the deep cluster): https://www.bro.org/development/projects/deep-cluster.html Ongoing work is currently the adaption of my multi-hop broker to the currently revised broker and the adaption of sumstats to work in a deep cluster setting. Both is described in detail in the long (sorry) remainder of this Email. ---------------------------------- Multi-hop broker ---------------------------------- I enhanced broker to support publish-subscribe-based communication between nodes that are not connected directly, but that are connected by a path of other nodes. The working title of this is multi-hop broker. As broker gets a significant revision soon, I want to share my design for multi-hop broker with you, so that I can include your comments when adding my multi-hop functionality to the revised broker. A specific challenge here is the routing of publications to all interested subscribers. For that, routing tables need to be established among all nodes in a deep cluster. These routing tables are established by flooding subscriptions in the deep cluster. Afterwards, publications can be routed on the shortest paths to all interested subscribers In that context, two issues arise, namely loop detection and avoidance as well as limiting the scope of subscriptions for rudimentary access control. Both issues are described in detail in the following. *** Loop detection and avoidance There is no unique identifier (like an IP address) anymore on which basis you can forward information. There might be only one recipient for a publish operation, but it can be also many of them. This can result in routing loops, so that messages are forwarded endlessly in the broker topology that is a result of the peerings between broker endpoints. Such loops has to be avoided as it would falsify results, e.g., results stored in datastores. There is basically two options here: 1. Loop avoidance: During the set up phase of the deep cluster it needs to be ensured that the topology does not contain loops. 2. Loop detection: Detect loops and drop duplicate packets. This requires either to store each forwarded message locally to detect duplicates or, more light-weight, to attach a ttl value to every broker message. When the ttl turns 0, the message gets deleted. However, the ttl does not prevent duplicates completely. For multi-hop broker we chose a hybrid approach between the two options. Loops in the broker topology need to be avoided during the initial configuration of the deep cluster. A ttl that is attached to every broker message will allow to detect routing loops and will result in an error output. The ttl value can be configured, but its default value is 32. However, there are certain configurations that require a more dense interconnection of nodes. In conventional bro clusters all workers are connected to manager and datanode, while the manager is also connected to the datanode. Obviously this already represents a loop. To avoid such routing loops we introduced an additional endpoint flag ``AUTO_ROUTING``. It indicates if the respective endpoint is allowed to route message topics on behalf of other nodes. Multi-hop topics are only stored locally and propagated if this flag is set. If an auto-routing endpoint is coupled with an ordinary endpoint, only the auto-routing endpoint will forward messages on behalf of the other endpoint. As a result, not every node will forward subscriptions received by others, so that loops can be prevented even though the interconnection of nodes in the deep cluster results in topological loops. *** Rudimentary Access Control To prevent that subscriptions are disseminated in the whole deep cluster single (=local) and multi-hop (=global) subscriptions are introduced. Single-hop subscriptions are shared among the direct neighbors only and thus make them only available within the one-hop neighborhood. In contrast, multi-hop subscriptions get flooded in the whole deep clusters. The differentiation in subscriptions with local (``LOCAL_SCOPE``) and global scope (``GLOBAL_SCOPE``) is intended to provide better efficiency and is configured as additional parameter when creating a broker ``message_queue``. The default setting is always ``LOCAL_SCOPE``. --------------------------------- Deep Sumstats --------------------------------- The intention is to extend sumstats to be used within a deep cluster to aggregate results in large-scale, but also to form sumstats groups on the fly, e.g., as a result of detected events. In the original sumstats only directly connected nodes in a cluster-setup exchanged messages. By using multi-hop broker, we can extend this to the complete deep cluster. We can form small groups of nodes that are not directly connected to each other, but that rather are connected indirectly by their subscriptions to a group id (e.g., "/bro/sumstats/port-scan-detected"). To adapt sumstats to the deep cluster two basic approaches are feasible: 1. Sumstats groups: Instead of a cluster we apply sumstats on a group of nodes in the deep cluster. This means that we keep the basic structure and functioning of the current sumstats. We only replace direct links by multi-hop links via multi-hop broker. However, we need a coordinator per group (in original sumstats the manager took over this task). This manager will initiate queries and will retrieve all results via the routing mechanisms of multi-hop broker. There will be no processing or aggregation of information directly in the deep cluster. Only nodes in the group and foremost the manager will be able to process and aggregate information. The deep cluster will only provide a routing service between all members of the group. 2. Sumstats and deep cluster become one: We integrate the data forwarding and the data storage with each other. The deep cluster is used to aggregate and process results in a completely distributed manner, while forwarding data to its destination. This means that all members of a sumstats group get interconnected by the deep cluster (and thus multi-hop broker) as in option 1, but now we have additional processing and aggregation of information while it is forwarded towards the manager by nodes of the deep cluster that are not part of the sumstats group. That is definitely the most challenging option, but in the long-term probably the most valuable one. I am currently working on option 1 as it is the straightforward option and as it is also a necessary intermediate step to get to option 2. I would be especially grateful for additional input / alternate views here. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 884 bytes Desc: OpenPGP digital signature Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160701/62e4d8fa/attachment.bin From vallentin at icir.org Fri Jul 1 11:24:52 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Fri, 1 Jul 2016 11:24:52 -0700 Subject: [Bro-Dev] Broker API update Message-ID: <20160701182452.GL6009@samurai.ICIR.org> Here's a brief update about the recent changes to the Broker API update. For more examples, you can look at the unit tests [1]. I'm also attaching a small graphic that illustrates the architecture from a high level. Fundamentally, endpoints represent processing units that exchange messages. An endpoint processes one message at a time, but all endpoints execute in paralell---similar to the actor model. Each message has associated a topic and endpoints deliver messages according to their subscriptions and peerings. Unlike in the previous version, there is no more global state: each endpoints lives in a context. There exist two types of endpoints: blocking and nonblocking. The former features a synchronous interface where users have to manually extract messages from the endpoint. Messages accumulate in the endpoint's "mailbox" until extracted. On the contrary, a nonblocking endpoint executes a callback for each message it receives. Here's a code snippet: context ctx; auto e0 = ctx.spawn(); auto e1 = ctx.spawn( [](const topic&, const message&) { // Inspect topic t and/or process message. } ); // Establish a peering between two endpoints. e0.peer(e1); // Subscribe to some topics. The subscribing endpoint will relayed its // subscriptions to all known peers. e0.subscribe("/foo"); e0.subscribe("/bar"); // Block and wait until a message arrives according to the endpoint's // subscriptions. e0.receive( [](const topic&, const message&) { // Inspect topic t and/or process message. }, [](const status& s) { // Process status messages, such as new/lost peers } ); Broker will only allow messages that contain instances of data, which is a sum type and can be any of the types in Bro's data model. In the future, we may loosen this restriction to allow users sending their own custom types over the same Broker communication channel. But for we enforce this requirement. This is the API to send messages: // Send a message of type under topic /foo. e1.publish("/foo", 42, 4.2, 42u); // Construct a message of type and sends it away. auto msg = make_data_message("foo", vector{42, nil, 44}); e1.publish("/foo", msg); You will get a compile error if the arguments to make_data_message or publish are not unambiguously convertible to one of the types in Broker's data model. To cross process boundaries, endpoints can use TCP communication as follows: // In one process. context ctx; auto e = ctx.spawn<...>(); e.listen("127.0.0.1", 42000); // In another process. context ctx; auto e = ctx.spawn<...>(); e.peer("127.0.0.1, 4200); If one would "accidentally" establish a TCP connection between endpoints in the same process, the runtime would detect this scenario and avoid costly message serialization and use "pointer passing" instead. On my todo list for the next couple of weeks are (in order): (0) Fix remaining bugs (mostly CAF) (1) Adapt data stores to the new API (2) Create Python bindings via pybind11 (3) Perform the "engine swap" in Bro (4) Extensive unit and performance testing (5) Update documentation ? la CAF [2] Moreover, I'm already in touch with Mathias Fischer regarding multi-hop subscription management. For now, we have a single-hop peerings, i.e., if a peering of the form A <-> B <-> C exists, A relays a subscription change only to B. See Mathias' email for more details on this topic. Matthias [1] https://github.com/bro/broker/tree/topic/matthias/actor-system/tests/cpp [2] http://actor-framework.readthedocs.io/en/latest/ -------------- next part -------------- A non-text attachment was scrubbed... Name: architecture.png Type: image/png Size: 169803 bytes Desc: not available Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160701/9dfeb550/attachment-0001.bin From anthony.kasza at gmail.com Sun Jul 3 07:11:50 2016 From: anthony.kasza at gmail.com (anthony kasza) Date: Sun, 3 Jul 2016 07:11:50 -0700 Subject: [Bro-Dev] CBAN naming In-Reply-To: <32D1FCCE-111E-401F-9A03-526317DC102F@icir.org> References: <20160605155547.GE92333@icir.org> <0FC5D616-0715-4015-8BCE-20D74BB10619@illinois.edu> <32D1FCCE-111E-401F-9A03-526317DC102F@icir.org> Message-ID: Just a heads up. It seems bpkg is no longer a unique name. http://www.bpkg.io -AK On Jun 15, 2016 1:30 AM, "Seth Hall" wrote: > > > On Jun 6, 2016, at 3:09 PM, Siwek, Jon wrote: > > > > If we switch the design to instead user the super-container format, then > that?s not an issue for me anymore because the relationship changes from > ?is a plugin? to ?may have a plugin?. > > I like the positioning of this because suddenly it suddenly feels very > natural to explain the contents of a package (or whatever it ends up > getting called). > > .Seth > > -- > Seth Hall > International Computer Science Institute > (Bro) because everyone has a network > http://www.bro.org/ > > > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160703/103fe81f/attachment.html From jbarber at computer.org Fri Jul 8 09:28:16 2016 From: jbarber at computer.org (Jeff Barber) Date: Fri, 8 Jul 2016 12:28:16 -0400 Subject: [Bro-Dev] Help with binpac 'cannot handle incremental input' Message-ID: I'm looking to develop a new TCP-based protocol parser using binpac. Getting the 'cannot handle incremental input' error from binpac. But I don't understand exactly why I'm seeing it. It happens whenever I use "flowunit = " instead of "datagram = ". I'm literally changing one line from the skeleton produced by the binpac quickstarter. Here's what I'm doing. Creating this as a plugin, so I start with init-plugin: bro-aux/plugin-support/init-plugin $SRC/analyzer/fob Bro_Fob fob Next I run the binpac quickstart: cd ~/src/binpac_quickstart ./start.py fob "FOB Protocol" $SRC/analyzer/fob --tcp --plugin --buffered If I now configure and make, everything works fine: cd $SRC/analyzer/fob ./configure --bro-dist=$BRO_SRC make But if I edit src/fob.pac to uncomment the "flowunit =" line (and comment out the datagram line), I get this error from binpac: src/fob-protocol.pac:18: error : cannot handle incremental input Thinking it had something to do with the definition of FOB_PDU there ("data: bytestring &restofdata;"), I removed that line so that my PDU definition is nothing but: type FOB_PDU(is_orig: bool) = record { foo: uint32; } &byteorder=bigendian; But I still get the error. I've tried the same thing with current master branch, and with v2.4.1 and an older version I'm using and I get the same result in each case. If I remove all fields from the PDU, it compiles, but that's not very useful. ;) I know the flowunit feature works. I see it in other analyzers in the source tree. Seems like I must be missing something simple in the .pac files. But I can't figure it out from inspection. Anybody know what's the trick? Thanks! Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160708/dbf4a722/attachment.html From jbarber at computer.org Fri Jul 8 12:04:40 2016 From: jbarber at computer.org (Jeff Barber) Date: Fri, 8 Jul 2016 15:04:40 -0400 Subject: [Bro-Dev] Help with binpac 'cannot handle incremental input' In-Reply-To: References: Message-ID: I finally figured out what's happening by looking at the binpac source. Documenting for posterity: Apparently if you use "flowunit", you *must* place the &length= on the record (or somewhere in the derived hierarchy of types under the record). I was under the impression that binpac could figure that out implicitly by measuring the size of the types in the record, and that you only needed to use &length when the field length was determined by a previous field value in the record. So, anyway this works: type FOB_PDU(is_orig: bool) = record { foo: uint32; } &byteorder=bigendian *&length=4*; My real analyzer will of course do this more dynamically. But at least I have a starting point that builds now. Cheers On Fri, Jul 8, 2016 at 12:28 PM, Jeff Barber wrote: > I'm looking to develop a new TCP-based protocol parser using binpac. > Getting the 'cannot handle incremental input' error from binpac. But I > don't understand exactly why I'm seeing it. It happens whenever I use > "flowunit = " instead of "datagram = ". I'm literally changing one line > from the skeleton produced by the binpac quickstarter. > Here's what I'm doing. Creating this as a plugin, so I start with > init-plugin: > > bro-aux/plugin-support/init-plugin $SRC/analyzer/fob Bro_Fob fob > > Next I run the binpac quickstart: > cd ~/src/binpac_quickstart > ./start.py fob "FOB Protocol" $SRC/analyzer/fob --tcp --plugin --buffered > > If I now configure and make, everything works fine: > cd $SRC/analyzer/fob > ./configure --bro-dist=$BRO_SRC > make > > But if I edit src/fob.pac to uncomment the "flowunit =" line (and comment > out the datagram line), I get this error from binpac: > src/fob-protocol.pac:18: error : cannot handle incremental input > > Thinking it had something to do with the definition of FOB_PDU there > ("data: bytestring &restofdata;"), I removed that line so that my PDU > definition is nothing but: > > type FOB_PDU(is_orig: bool) = record { > foo: uint32; > } &byteorder=bigendian; > > But I still get the error. I've tried the same thing with current master > branch, and with v2.4.1 and an older version I'm using and I get the same > result in each case. > > If I remove all fields from the PDU, it compiles, but that's not very > useful. ;) > > I know the flowunit feature works. I see it in other analyzers in the > source tree. Seems like I must be missing something simple in the .pac > files. But I can't figure it out from inspection. > > Anybody know what's the trick? > > Thanks! > Jeff > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160708/d59e6f51/attachment.html From asharma at lbl.gov Fri Jul 8 16:59:46 2016 From: asharma at lbl.gov (Aashish Sharma) Date: Fri, 8 Jul 2016 16:59:46 -0700 Subject: [Bro-Dev] input-framework file locations Message-ID: <20160708235945.GJ36120@mac-4.local> I have been thinking and trying different things but for now, it appears that if we are to share policies around, there is no easy way to be able to distribute input-files along with policy files. Basically, right now I use redef Scan::whitelist_ip_file = "/usr/local/bro/feeds/ip-whitelist.scan" ; and then expect everyone to edit path as their setup demands it and place accompanying sample file in the directory or create one for themselves - this all introduces errors as well as slows down deployment. Is there a way I can use relative paths instead of absolute paths for input-framework digestion. At present a new-heuristics dir can have __load__.bro with all policies but input-framework won't read files relative to that directory or where it is placed. redef Scan::whitelist_ip_file = "../feeds/ip-whitelist.scan" ; Something similar to __load__.bro model Also, one question I have is should all input-files go to a 'standard' feeds/input dir in bro or be scattered around along with their accompanied bro policies (ie in individual directories ) Something to think about as with more and more reliance on input-framework i think there is a need for 'standardization' on where to put input-files and how to easily find and read them. Aashish From robin at icir.org Fri Jul 8 17:41:04 2016 From: robin at icir.org (Robin Sommer) Date: Fri, 8 Jul 2016 17:41:04 -0700 Subject: [Bro-Dev] input-framework file locations In-Reply-To: <20160708235945.GJ36120@mac-4.local> References: <20160708235945.GJ36120@mac-4.local> Message-ID: <20160709004104.GA49737@icir.org> On Fri, Jul 08, 2016 at 16:59 -0700, you wrote: > Something similar to __load__.bro model @DIR gives you the path to the directory the current script is located in. Does that help? Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jazoff at illinois.edu Sun Jul 10 06:28:48 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Sun, 10 Jul 2016 13:28:48 +0000 Subject: [Bro-Dev] [Bro-Blue] scan detector related sumstats load In-Reply-To: <20160710050555.GA1504@Beezling.local> References: <271FCDB2-FB75-4C89-80D8-0F23CD7CAE55@illinois.edu> <030D620B-9F7B-4EA8-BFA6-DD47B9870C67@illinois.edu> <20160710050555.GA1504@Beezling.local> Message-ID: <7C00AD1F-15AE-4F2F-A3A0-F9F171794AF7@illinois.edu> > > On Jul 10, 2016, at 1:06 AM, Johanna Amann wrote: > > First - wouldn't this be a topic for bro-dev instead of bro-blue? I don't > really see a reason to keep this private. Indeed.. Moving this to bro-dev as there aren't any more internal logs or addresses. To catch everyone up, I noticed some performance issues with how scan.bro and sumstats interacts. scan.bro creates a large number of unique sumstats keys, and the mechanism that the manager uses to fetch them from the workers is not very efficient. I implemented some ideas to improve things and then realized that I basically re-implemented how it used to work in 2013. > In any case... Seth might remember this better, but as far as I remember, > we had some huge, quite difficult to debug Problems at bigger sites (I > think especially at Indiana), when running Bro with the old code that used > batching. I think I remember something about this causing _huge_ memory > spikes in those circumstances and that the best way around that was this > switch. I could see a batch size of 50 being a problem, but if batching was the problem, simply setting the batch size to 1 would be better than what we have now. If worker1 has key1 and worker2 has key2, right now what happens is: manager send out get_a_key to get a key from each worker worker1 will send_a_key for key1 worker2 will send_a_key for key2 At this point stats_keys[uid] contains [key1, key2] Now, this is where everything goes wrong manager sends out cluster_get_result for key1 worker1 sends out cluster_send_result reply for key1 worker2 sends out cluster_send_result empty reply manager sends out cluster_get_result for key2 worker1 sends out cluster_send_result empty reply worker2 sends out cluster_send_result reply for key2 With 56 workers, you end up with manager sends out cluster_get_result for key1 worker1 sends out cluster_send_result reply for key1 worker2 sends out cluster_send_result empty reply worker3 sends out cluster_send_result empty reply worker4 sends out cluster_send_result empty reply ... worker56 sends out cluster_send_result empty reply manager sends out cluster_get_result for key2 worker1 sends out cluster_send_result empty reply worker2 sends out cluster_send_result reply for key2 worker3 sends out cluster_send_result empty reply worker4 sends out cluster_send_result empty reply ... worker56 sends out cluster_send_result empty reply For 56 workers to send the SAME key up to the manager you will get 1 get_a_key, 56 send_a_key, 1 cluster_get_results, and 56 cluster_send_result events. This is the best case scenario for the current system. This works ok if your keys are things like country codes or mime types that do not grow unbounded. There is a little overhead, but not much. However, for 56 workers to send 56 different keys up to the manager you get 1 get_a_key, 56 send_a_key, 56 cluster_get_results, and 3136 cluster_send_result events. This is the worst case scenario and is what scan.bro triggers. > You also have to be a bit careful when going back to old code (or changing > the sumstats code) - the code has a bit of a... sad interaction with the > Bro message cache (or whatever it is called) - if you forget to call > copy() at all the right places that exchange messages about data in > tables, you are not actually going to exchange data but just references, > which can lead to stale data on the manager (and also reduce message load > as a side effect - while leading to wrong results). I am not sure if that > is the case here, I am just saying you have to be quite careful changing > things :) I saw those copy()'s.. I didn't understand them but I left them alone :-) > And - one thing in your older email - removing the Cluster::worker_count > == done_with[uid] is also a bit problematic because it makes it difficult > to check the correctness of the results. Which can become an issue with > sumstats - sometimes single nodes reply surprisingly slowly. Yeah.. I realized that was important. What my code currently does is: When the manager wants the results, it sends out a single get_some_key_data event. This is similar to the old send_data event. get_some_key_data sends one key to the manager using the existing cluster_send_result and does a schedule 0.001 sec { SumStats::get_some_key_data(uid, ss_name, cleanup) }; to re-call itself if there is more data to be sent. When there is no more data for the current ss_name, it sends a 'send_no_more_data' event up to the manager. I moved the Cluster::worker_count check to count the send_no_more_data events. So rather than being done once per key, the unit of work is the entire table. I believe this is almost identical to the 2013 code. Compared to the 3000+ events before, for 56 workers to send 56 different keys up to the manager this mechanism uses 1 get_some_key_data, 56 cluster_send_result events, and 56 send_no_more_data events. The send_no_more_data count will always be 56, so the overhead is a small constant. It is possibly that the more efficient method of transferring data up to the manager was what was causing the memory spikes. I think the current code may appear to behave better, but it is also spending 97.5% of its time sending around extra events and never making any progress. It may be that the reason it was changed to transfer one key at a time was so that bro would never have to build a copy of the entire sumstat table in memory on the manager. Fixing that issue and the event amplification at the same time would be a little harder. I know two ways to solve that problem: * Do an n-way merge of streams of sorted keys from each worker.. implementing that in bro script would not be fun. * Shard the sumstats table itself. If each sumstats table was first bucketed by a hash of the key on each worker into 37 buckets, you could transfer and process each bucket serially which would cut the memory usage on the manager to 1/37. This is probably really easy to implement. Not sure how the hash would be computed in bro script, but the rest is trivial. Oh, and I believe I also found a small inefficiency with how cluster_key_intermediate_response works, the recent_global_view_keys is on the worker, so each worker can independantly kick off a cluster_key_intermediate_response for the same key. This small patch keeps track of the recent_global_view_keys on the manager too and should cut down on repeated events: +global recent_global_view_keys: table[string, Key] of count &create_expire=1min &default=0; # Managers handle intermediate updates here. event SumStats::cluster_key_intermediate_response(ss_name: string, key: Key) { #print fmt("MANAGER: receiving intermediate key data from %s", get_event_peer()$descr); #print fmt("MANAGER: requesting key data for %s", key); + if ( [ss_name, key] in recent_global_view_keys ) + return; if ( ss_name in outstanding_global_views && |outstanding_global_views[ss_name]| > max_outstanding_global_views ) @@ -451,6 +458,7 @@ return; } + ++recent_global_view_keys[ss_name, key]; ++outstanding_global_views[ss_name]; local uid = unique_id(""); -- - Justin Azoff From jazoff at illinois.edu Mon Jul 11 17:44:43 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Tue, 12 Jul 2016 00:44:43 +0000 Subject: [Bro-Dev] Unified scan.bro script Message-ID: <7945BBF2-DCA5-49A1-AE2F-BAF773A22090@illinois.edu> As part of the sumstats things I've been looking into I tried refactoring scan.bro to put less load on sumstats. The refactored script is at https://gist.github.com/JustinAzoff/fe68223da6f81319d3389c605b8dfb99 It is.. amazing! The unified code is simpler, uses less memory, puts less load on sumstats, generates nicer notice messages, and detects attackers scanning across multiple victims AND ports. Details: The current scan.bro maintains two sumstats streams keyed by attacked ip and port. When attacker attempts to connect to victim on port 22, sumstats effectively creates: an [attacker 22] key with data containing [victim] an [attacker victim] key with data containing [22] It does this so it can figure out if an attacker is scanning lots victims on one port, or lots of ports on one victim. When an attacker does the equivalent of 'nmap -p 22 your/16', sumstats ends up with 65536 extra [attacker victim] keys. This kills the sumstats :-) my refactored version simply creates: an [attacker] key containing [victim/22, othervictim/22, ...] This means that no matter how many hosts or ports attacker scans, there will only ever be one key. Additionally, since the reducer is configured as ... $apply=set(SumStats::UNIQUE), $unique_max=double_to_count(scan_threshold+2) the data the key references can not grow unbounded, so a full /16 port scan can only create 1 key and scan_threshold+2 values per worker process. This is a huge reduction in the amount of data stored. The downside of this was that the notices were effectively "attacker scanned... something!", but I realized I could analyze all the victim/port strings in unique_vals and figure out what was scanned. With that in place, bro now generates notices like this: Scan::Scan 198.20.69.98 made 102 failed connections on 102 hosts and 77 ports in 4m59s Scan::Scan 198.20.99.130 made 102 failed connections on 102 hosts and 78 ports in 4m59s Scan::Scan 36.101.163.186 made 102 failed connections on port 23 in 0m14s Scan::Scan 91.212.44.254 made 102 failed connections on ports 135, 445 in 4m59s Scan::Scan 207.244.70.169 made 103 failed connections on port 389 in 5m0s Scan::Scan 222.124.28.164 made 102 failed connections on port 23 in 0m14s Scan::Scan 91.236.75.4 made 102 failed connections on ports 8080, 3128 in 4m58s Scan::Scan 177.18.254.165 made 102 failed connections on port 23 in 0m38s Scan::Scan 14.169.221.169 made 102 failed connections on port 23 in 0m36s Scan::Scan 192.99.58.163 made 100 failed connections on 100 hosts and 100 ports in 4m55s The only downside is that 192.99.58.163 appears to be backscatter (conn_state and history are OTH H), but that's an issue inside is_failed_conn somewhere which is unchanged from scan.bro It should be a drop in replacement for scan.bro other than that any notice policies or scan policy hooks will need to be changed. It could possibly be changed to still raise Address_Scan/Port_Scan notices at least in some cases. I don't know how people may be using those notices differently - we handle them the same, so the change to a unified notice type is a non-issue for us. -- - Justin Azoff From seth at icir.org Tue Jul 12 14:10:59 2016 From: seth at icir.org (Seth Hall) Date: Tue, 12 Jul 2016 17:10:59 -0400 Subject: [Bro-Dev] Unified scan.bro script In-Reply-To: <7945BBF2-DCA5-49A1-AE2F-BAF773A22090@illinois.edu> References: <7945BBF2-DCA5-49A1-AE2F-BAF773A22090@illinois.edu> Message-ID: <1A895413-39E6-4574-9E5C-D7CC9D72C4DD@icir.org> > On Jul 11, 2016, at 8:44 PM, Azoff, Justin S wrote: > > It is.. amazing! The unified code is simpler, uses less memory, puts less load on sumstats, generates nicer notice messages, and detects attackers scanning across multiple victims AND ports. Nice job Justin! Perhaps this begs the question if we should use this version in Bro? We do have a tendency to make design decisions so that Bro works the best that it can with minimal configuration for even the largest sites. I think the notices are very reasonable and have the additional benefit of being a single noticed to watch for for "scanning". Having to watch for two different notices always felt a bit unnatural. I think that I personally care about scans, not the type of scan being performed (although there may be some nuance to that that someone is taking advantage of?). .Seth -- Seth Hall International Computer Science Institute (Bro) because everyone has a network http://www.bro.org/ From jazoff at illinois.edu Tue Jul 12 14:31:09 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Tue, 12 Jul 2016 21:31:09 +0000 Subject: [Bro-Dev] Unified scan.bro script In-Reply-To: <1A895413-39E6-4574-9E5C-D7CC9D72C4DD@icir.org> References: <7945BBF2-DCA5-49A1-AE2F-BAF773A22090@illinois.edu> <1A895413-39E6-4574-9E5C-D7CC9D72C4DD@icir.org> Message-ID: > On Jul 12, 2016, at 5:10 PM, Seth Hall wrote: > > >> On Jul 11, 2016, at 8:44 PM, Azoff, Justin S wrote: >> >> It is.. amazing! The unified code is simpler, uses less memory, puts less load on sumstats, generates nicer notice messages, and detects attackers scanning across multiple victims AND ports. > > Nice job Justin! Perhaps this begs the question if we should use this version in Bro? We do have a tendency to make design decisions so that Bro works the best that it can with minimal configuration for even the largest sites. I think that is the hard part :-) Minimally as a first step we can make it available with 2.5 but disabled by default. If someone isn't relying on the existing behavior they can take advantage of it immediately. We can move the parts common to scan.bro and scan_unified.bro into a common script so they won't conflict. We could also make it the default in 2.5, but as long as someone keeps their old local.bro nothing will change unless they want it to. We just need to fix the backscatter issue first :-) > I think the notices are very reasonable and have the additional benefit of being a single noticed to watch for for "scanning". Having to watch for two different notices always felt a bit unnatural. I think that I personally care about scans, not the type of scan being performed (although there may be some nuance to that that someone is taking advantage of?). That did occur to me.. with this new version it is hard to apply a notice policy to the resulting notice.. i.e. do one thing if they were scanning port 22, do something else if they were scanning port 3389, do something else if they port scanned a single machine.. If only I could put the set of ports and hosts scanned inside the notice somewhere.. The unified scanning detection complicates the notice generation. Before there was 1 notice for each of 2 different behaviors, my script has 1 notice for 5 behaviors: * Scanning 1 port on many hosts * Scanning <= 5 ports on many hosts * Scanning many ports on 1 host * Scanning many ports on <= 5 hosts * Scanning many ports on many hosts. Maybe a solution is to raise different notices? otherwise someone needs to do nasty regex stuff inside of a notice policy to tell them apart. It would help if I knew how current bro users were using Scan::AddressScan and Scan::PortScan notices. -- - Justin Azoff From vallentin at icir.org Thu Jul 14 16:41:15 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Thu, 14 Jul 2016 16:41:15 -0700 Subject: [Bro-Dev] Some thoughts on the bro deep cluster, broker, and sumstats In-Reply-To: <5776941B.8090608@icsi.berkeley.edu> References: <5776941B.8090608@icsi.berkeley.edu> Message-ID: <20160714234115.GB42650@shogun.local> > *tl;dr* Just a quick heads-up: thanks a bunch for summarizing your thoughts! We haven't forgotten your mail and will get back after we're done with our crunch with releasing Bro 2.5. Stay tuned, Matthias From seth at icir.org Fri Jul 15 08:08:29 2016 From: seth at icir.org (Seth Hall) Date: Fri, 15 Jul 2016 11:08:29 -0400 Subject: [Bro-Dev] Remove application/pkix-cert from files.log? Message-ID: What does everyone think of making some change for 2.5 so that certificates from SSL aren't logged in the files.log by default? I've heard grumblings about the number of certs that show up from quite a few people and personally noticed that the number of certificates will dwarf all other files types pretty badly which makes the output look a bit weird since very few people are ever interested in looking at those files in the files.log. Certificates would still be passed through the files framework, so it's not an architectural change, it would all be related to just not doing the log. There is one minor issue that this brings up though in that right now certificate hashes are all given in the files.log. We could move them elsewhere like x509.log or ssl.log, but I'm curious if anyone had thoughts on what they think would be most useful? .Seth -- Seth Hall International Computer Science Institute (Bro) because everyone has a network http://www.bro.org/ From johanna at icir.org Fri Jul 15 08:54:05 2016 From: johanna at icir.org (Johanna Amann) Date: Fri, 15 Jul 2016 08:54:05 -0700 Subject: [Bro-Dev] Remove application/pkix-cert from files.log? In-Reply-To: References: Message-ID: I think kind of like having the certificates being handled as files by default. However, I see that most people who run clusters in production do not want that information in files.log. So - from my point of view, it might make sense to have a policy script that filters certificates from files.log and adds the hashes to x509.log; and we have that auto-loaded by default in local.bro. Would that make sense? Johanna On 15 Jul 2016, at 8:08, Seth Hall wrote: > What does everyone think of making some change for 2.5 so that > certificates from SSL aren't logged in the files.log by default? I've > heard grumblings about the number of certs that show up from quite a > few people and personally noticed that the number of certificates will > dwarf all other files types pretty badly which makes the output look a > bit weird since very few people are ever interested in looking at > those files in the files.log. > > Certificates would still be passed through the files framework, so > it's not an architectural change, it would all be related to just not > doing the log. There is one minor issue that this brings up though in > that right now certificate hashes are all given in the files.log. We > could move them elsewhere like x509.log or ssl.log, but I'm curious if > anyone had thoughts on what they think would be most useful? > > .Seth > > -- > Seth Hall > International Computer Science Institute > (Bro) because everyone has a network > http://www.bro.org/ > > > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From klehigh at iu.edu Fri Jul 15 09:58:13 2016 From: klehigh at iu.edu (Keith Lehigh) Date: Fri, 15 Jul 2016 12:58:13 -0400 Subject: [Bro-Dev] Remove application/pkix-cert from files.log? In-Reply-To: References: Message-ID: <99B2E442-15C6-4321-A769-9B2789C97482@iu.edu> I agree that having the certs logged to files.log creates a lot of noise that can be painful to wade through. The downside to placing the hashes in x509.log is that would require a 2nd step of turning the fuid?s into cuid?s when searching for activity involving a given cert hash. What about the idea of having files.x509.log & simply diverting pkix-cert to that log by default? Keeping ?files? in the name, allows one to search for in files*.log and use unix tools to grab the conn details. Quite useful when you have a mixed list of cert hashes and other hashes of interest. - Keith > On Jul 15, 2016, at 11:54, Johanna Amann wrote: > > I think kind of like having the certificates being handled as files by > default. However, I see that most people who run clusters in production > do not want that information in files.log. So - from my point of view, > it might make sense to have a policy script that filters certificates > from files.log and adds the hashes to x509.log; and we have that > auto-loaded by default in local.bro. > > Would that make sense? > > Johanna > > On 15 Jul 2016, at 8:08, Seth Hall wrote: > >> What does everyone think of making some change for 2.5 so that >> certificates from SSL aren't logged in the files.log by default? I've >> heard grumblings about the number of certs that show up from quite a >> few people and personally noticed that the number of certificates will >> dwarf all other files types pretty badly which makes the output look a >> bit weird since very few people are ever interested in looking at >> those files in the files.log. >> >> Certificates would still be passed through the files framework, so >> it's not an architectural change, it would all be related to just not >> doing the log. There is one minor issue that this brings up though in >> that right now certificate hashes are all given in the files.log. We >> could move them elsewhere like x509.log or ssl.log, but I'm curious if >> anyone had thoughts on what they think would be most useful? >> >> .Seth >> >> -- >> Seth Hall >> International Computer Science Institute >> (Bro) because everyone has a network >> http://www.bro.org/ >> >> >> _______________________________________________ >> bro-dev mailing list >> bro-dev at bro.org >> http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 3569 bytes Desc: not available Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160715/741a3b8e/attachment.bin From jazoff at illinois.edu Fri Jul 15 15:47:14 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Fri, 15 Jul 2016 22:47:14 +0000 Subject: [Bro-Dev] Unified scan.bro script In-Reply-To: References: <7945BBF2-DCA5-49A1-AE2F-BAF773A22090@illinois.edu> <1A895413-39E6-4574-9E5C-D7CC9D72C4DD@icir.org> Message-ID: A further iteration of the unified scan.bro script is now in the branch topic/jazoff/scan-unified Use of the branch isn't required though, as it is a self contained change one can just grab the https://raw.githubusercontent.com/bro/bro/31b63445ed07e2e76f98c49dd59091b1742523d1/scripts/policy/misc/scan.bro and replace the stock scan.bro with it - or better, move it to site and change the loading from misc/scan to just ./scan.bro) It is aiming to replace scan.bro so you can not run both at the same time. However, If you really wanted to you could search/replace all the identifiers that conflict with scan.bro and run both. It should behave visibly similar to current scan.bro except there is a new Random scan notice: Scan::Random_Scan 198.20.69.74 scanned at least 102 hosts on 82 ports in 4m51s and the existing notices may report for more than one port or host (up to 5) - after that it becomes a Random_Scan Address_Scan 91.236.75.4 scanned at least 102 unique hosts on ports 3128, 8080 in 4m47s -- - Justin Azoff From dirk.leinenbach at consistec.de Thu Jul 21 06:41:13 2016 From: dirk.leinenbach at consistec.de (Dirk Leinenbach) Date: Thu, 21 Jul 2016 15:41:13 +0200 Subject: [Bro-Dev] IP-in-IP tunnel: issue with capture length Message-ID: <5790D0F9.3030004@consistec.de> Hi, I'm having problems with IP-in-IP tunneled traffic which contains an ethernet frame check sequence (FCS). 1) Bro seems to attribute the FCS to the length of the outer IP packet and then complains that the inner IP packet is too small compared to the capture length (in weird.log: "inner_IP_payload_length_mismatch") Then I thought it would be ok to simply drop the corresponding check in Sessions.c: ParseIPPacket() because too much content shouldn't "hurt". - if ( (uint32)caplen != inner->TotalLen() ) - return (uint32)caplen < inner->TotalLen() ? -1 : 1; + if ( (uint32)caplen < inner->TotalLen() ) + return -1; Would that be ok in your opinion? If not, what would be a better way to deal with this? 2) With the above patch applied, bro correctly sees the inner traffic, but from time to time it segfaults (every other day roughtly). Until now i figured out the following information, but cannot really see what's going wrong: a) bro always crashes at a tunneled TCP packet with active reset flag b) I see very few such packets (it might be that the crashing one is the only within quite some time before the crash: I don't have all traffic available) c) I cannot reproduce the problem by simply starting bro on a pcap file with the offending packet (and ~100MB traffic before the crash) (even valgrind doesn't report anything useful) From the stacktrace of the core file (cf. below) it looks as if PacketWithRst() somehow triggered the destructor of (my own) SIP plugin. However, I have no idea how that could happen. Could you help me with this problem? Thanks, Dirk #0 std::_List_base >::_M_clear (this=this at entry=0x2f373b0) at /usr/include/c++/4.7/bits/list.tcc:74 #1 0x00000000006a0ade in ~_List_base (this=0x2f373b0, __in_chrg=) at /usr/include/c++/4.7/bits/stl_list.h:379 #2 ~list (this=0x2f373b0, __in_chrg=) at /usr/include/c++/4.7/bits/stl_list.h:436 #3 plugin::Plugin::~Plugin (this=0x2f37360, __in_chrg=) at bro/src/plugin/Plugin.cc:136 #4 0x00007f1fa7d2ef77 in ~Plugin (this=0x2f37360, __in_chrg=) at sip/src/Plugin.cc:8 #5 plugin::Consistec_SIP::Plugin::~Plugin (this=0x2f37360, __in_chrg=) at sip/src/Plugin.cc:8 #6 0x000000000079d4bd in PacketWithRST (this=0x3482680) at bro/src/analyzer/protocol/tcp/TCP.cc:1810 #7 analyzer::tcp::TCP_Analyzer::DeliverPacket (this=0x3482680, len=0, data=0x7f1fa16f9aca
, is_orig=false, seq=, ip=0x34e05c0, caplen=0) at bro/src/analyzer/protocol/tcp/TCP.cc:1280 #8 0x0000000000807a6a in analyzer::Analyzer::NextPacket (this=0x3482680, len=20, data=, is_orig=, seq=, ip=, caplen=20) at bro/src/analyzer/Analyzer.cc:222 #9 0x000000000055ecee in Connection::NextPacket (this=0x2f48c00, t=, is_orig=, ip=, len=, caplen=, data=, record_packet=@0x7ffc33d50898: 1, record_content=@0x7ffc33d5089c: 1, hdr=0x7ffc33d50b10, pkt=0x7f1fa16f9aa2
, hdr_size=0) at bro/src/Conn.cc:260 #10 0x00000000005f819a in NetSessions::DoNextPacket (this=this at entry=0xf25000, t=1468916092.7505391, t at entry=, hdr=hdr at entry=0x7ffc33d50b10, ip_hdr=ip_hdr at entry=0x34e05c0, pkt=pkt at entry=0x7f1fa16f9aa2
, hdr_size=hdr_size at entry=0, encapsulation=0x0, encapsulation at entry=0x34b3138) at bro/src/Sessions.cc:757 #11 0x00000000005f91a4 in NetSessions::DoNextInnerPacket (this=0xf25000, t=1468916092.7505391, hdr=, inner=0x34e05c0, prev=, ec=...) at bro/src/Sessions.cc:805 #12 0x00000000005f88ca in NetSessions::DoNextPacket (this=this at entry=0xf25000, t=1468916092.7505391, t at entry=, hdr=hdr at entry=0xf762a0, ip_hdr=, ip_hdr at entry=0x7ffc33d50e60, pkt=pkt at entry=0x7f1fa16f9a80
, hdr_size=hdr_size at entry=14, encapsulation=encapsulation at entry=0x0) at bro/src/Sessions.cc:665 #13 0x00000000005f96d6 in NetSessions::NextPacket (this=0xf25000, t=1468916092.7505391, hdr=0xf762a0, pkt=0x7f1fa16f9a80
, hdr_size=14) at bro/src/Sessions.cc:231 #14 0x00000000005c8048 in net_packet_dispatch (t=1468916092.7505391, hdr=0xf762a0, pkt=0x7f1fa16f9a80
, hdr_size=14, src_ps=0xf76160) at bro/src/Net.cc:277 -- Dr.-Ing. Dirk Leinenbach - Leitung Softwareentwicklung consistec Engineering & Consulting GmbH ------------------------------------------------------------------ Europaallee 5 Fon: +49 (0)681 / 959044-0 D-66113 Saarbr?cken Fax: +49 (0)681 / 959044-11 http://www.consistec.de e-mail: dirk.leinenbach at consistec.de Registergericht: Amtsgericht Saarbr?cken Registerblatt: HRB12003 Gesch?ftsf?hrer: Dr. Thomas Sinnwell, Volker Leiendecker, Stefan Sinnwell -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160721/80c0ee6e/attachment-0001.html From vallentin at icir.org Fri Jul 22 12:26:10 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Fri, 22 Jul 2016 12:26:10 -0700 Subject: [Bro-Dev] Broker data store API Message-ID: <20160722192610.GL91838@samurai.ICIR.org> TL;DR: - Does anyone use Broker's RocksDB backend? - Brief overview of the revamped data store frontend API I've been working on the Broker data store API a bit, trying to come with the smallest denominator possible for an initial release. So far I have ported the in-memory SQLite backend over. This made me wonder: did anyone ever use (or wanted to use) the RocksDB in production? I wonder if we can keep it out for Bro 2.5. Regarding the API, here's a snippet that illustrates the user-facing parts: // Setup an endpoint. context ctx; auto ep = ctx.spawn(); // Attach a master datastore with backend. The semantics of // "attaching" are open-or-create: if a master exists under the // given name, use it, otherwise create it. backend_options opts; opts["path"] = "/tmp/test.db"; auto ds = ep.attach("foo", std::move(opts)); if (!ds) std::terminate(); // Perform some asynchronous operations. ds->put("foo", 4.2); ds->put(42, set{"x", "y", "z"}); ds->remove(42, "z"); // data at key 42 is now {"x", "y"} ds->increment("foo", 1.7); // data at key "foo" is now 5.7 // Add a value that expires after 10 seconds. ds->put("bar", 4.2, time::now() + std::chrono::seconds(10)); // Get data in a blocking fashion. auto x = ds->get("foo"); // Equivalent to: get("foo"), the // blocking API is the default. // Get data in a non-blocking fashion. The function then() returns // immediately and one MUST NOT capture any variables on the stack by // reference in the callback. The runtime invokes the callback as soon // as the result has arrived. ds->get("foo").then( [=](const data& d) { cout << "data at key 'foo': " << d << endl; }, [=](const error& e) { if (e == ec::no_such_key) cout << "no such key: foo" << endl; } }); Here's another setup with two peering endpoints, one having a master and one a clone (directly taken from the unit tests). This illustrates how data stores and peering go hand in hand. context ctx; auto ep0 = ctx.spawn(); auto ep1 = ctx.spawn(); ep0.peer(ep1); auto m = ep0.attach("flaka"); auto c = ep1.attach("flaka"); REQUIRE(m); REQUIRE(c); c->put("foo", 4.2); std::this_thread::sleep_for(propagation_delay); // master -> clone auto v = c->get("foo"); REQUIRE(v); CHECK_EQUAL(v, data{4.2}); c->decrement("foo", 0.2); std::this_thread::sleep_for(propagation_delay); // master -> clone v = c->get("foo"); REQUIRE(v); CHECK_EQUAL(v, data{4.0}); I think this API covers the most common use cases. It's always easy to add functionality later, so my goal is to find the smallest common denominator. Matthias From jsiwek at illinois.edu Sat Jul 23 10:54:19 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Sat, 23 Jul 2016 17:54:19 +0000 Subject: [Bro-Dev] Broker data store API In-Reply-To: <20160722192610.GL91838@samurai.ICIR.org> References: <20160722192610.GL91838@samurai.ICIR.org> Message-ID: > On Jul 22, 2016, at 2:26 PM, Matthias Vallentin wrote: > > - Does anyone use Broker's RocksDB backend? My recollection is that it was just nice-to-have an optional backend that users could choose, perhaps if they need better performance relative to SQLite. But I probably took the time to try and get that working/ready just as reassurance that the datastore API would be able to implement a variety of backends. Not sure about the choice of RocksDB in particular ? could have just been that it happened to pop up on people?s radar at the right time. Given those historical reasons for it existing, would make sense to me if it were temporarily ignored or removed completely (unless there?s people already invested in using it). Hope that helps. - Jon From vallentin at icir.org Sun Jul 24 11:43:05 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Sun, 24 Jul 2016 11:43:05 -0700 Subject: [Bro-Dev] Broker data store API In-Reply-To: References: <20160722192610.GL91838@samurai.ICIR.org> Message-ID: <20160724184305.GN38625@ninja.local> > Not sure about the choice of RocksDB in particular ? could have just > been that it happened to pop up on people?s radar at the right time. It's certainly an industrial-strength key-value, so I think it's solid choice for those with better performance when needing persistence. > Given those historical reasons for it existing, would make sense to me > if it were temporarily ignored or removed completely (unless there?s > people already invested in using it). My plan was to put on hold for now, just to have less moving parts. It's great that you've already invested the time to understand the API and come up with an implementation. Same for SQLite. It took me only a day to convert your backend code and read up on SQLite here and there. I would imagine it will be the same for RocksDB. That said, adding backends is fortunately a quite mechanical task. It's easy to ship as an incremental release. I'm curious to find out what types of backends they would like to see and use once they build broker-enabled applications. Matthias From jsiwek at illinois.edu Sun Jul 24 12:45:32 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Sun, 24 Jul 2016 19:45:32 +0000 Subject: [Bro-Dev] package manager progress Message-ID: The package manager client is at a point now where I think it would be usable. Documentation is here: https://bro.github.io/package-manager/ There is a branch in the ?bro? repo called ?package-manager? that simply changes CMake scripts to install ?bro-pkg? along with bro. Here?s an example usage/session: $ git clone --recursive --branch=package-manager git://bro.org/bro ... $ cd bro && ./configure && make install ... $ /usr/local/bro/bin/bro-pkg list all default/jsiwek/bro-test-package $ /usr/local/bro/bin/bro-pkg install bro-test-package installed "bro-test-package" loaded "bro-test-package" $ /usr/local/bro/bin/bro packages loaded bro-test-package plugin loaded bro-test-package scripts $ /usr/local/bro/bin/broctl Test package: initialized ? That test package shows that bro-pkg was able to install a package containing Bro scripts, a Bro plugin, and a BroControl plugin and everything should ?just work? without needing any configuration. Roadmap/TODO/Questions: * Add a way for package?s to define ?discoverability metadata?. E.g. following the original plan for this would involve putting something like a ?tags? field in each package?s pkg.meta file, but the problem with this is the client would need to either download every package to be able to search this data or have a third-party periodically aggregate it. My current idea is that instead of putting this type of data inside the package?s metadata, the user puts it in the package source?s metadata. They do this on first registration and may update it whenever. That way, bro-pkg always has access to latest discoverability metadata, no need for a separate aggregation process. It?s also something that will rarely change, so not a problem for that data to live in a repo not owned by the package author and not much increased burden for Bro Team to accept pull requests to update this data. Thoughts? * Automatic inter-package dependency analysis Simply a TODO. I put it at lower priority since I don?t think it will be common right off the bat to have complex package dependencies and users can always manually resolve dependencies at the moment. * Is it acceptable to depend on GitPython and semantic_version python packages? Both are replaceable implementation details, just didn?t want to write something myself if not necessary and in interest of time. * Documentation is hosted on GitHub at the moment, move to bro.org? Mostly just on GitHub now to be able to show something without having to touch any of the master bro/www doc generation processes, but maybe it?s a nice thing to start keeping docs more compartmentalized? The current doc/www setup feels like it?s getting rather large/monolithic and maybe that contributes to the difficulty of approaching/understanding it when there?s breakages. Just an idea. * Thoughts on when to merge ?package-manager? branch in ?bro? ? IMO, it can be done now or soon after I address responses/feedback to this email. - Jon From hosom at battelle.org Mon Jul 25 04:46:20 2016 From: hosom at battelle.org (Hosom, Stephen M) Date: Mon, 25 Jul 2016 11:46:20 +0000 Subject: [Bro-Dev] Broker data store API In-Reply-To: <20160724184305.GN38625@ninja.local> References: <20160722192610.GL91838@samurai.ICIR.org> , <20160724184305.GN38625@ninja.local> Message-ID: I can't speak to whether or not it is 'needed', but I have had desire to use it in the past... The only thing preventing me from doing it was the fact that Broker is currently a fast moving target. Generally speaking, I was wanting to do it so that I could save state between cluster restarts (specifically for authentication data). ________________________________________ From: bro-dev-bounces at bro.org [bro-dev-bounces at bro.org] on behalf of Matthias Vallentin [vallentin at icir.org] Sent: Sunday, July 24, 2016 2:43 PM To: Siwek, Jon Cc: bro-dev at bro.org Subject: Re: [Bro-Dev] Broker data store API > Not sure about the choice of RocksDB in particular ? could have just > been that it happened to pop up on people?s radar at the right time. It's certainly an industrial-strength key-value, so I think it's solid choice for those with better performance when needing persistence. > Given those historical reasons for it existing, would make sense to me > if it were temporarily ignored or removed completely (unless there?s > people already invested in using it). My plan was to put on hold for now, just to have less moving parts. It's great that you've already invested the time to understand the API and come up with an implementation. Same for SQLite. It took me only a day to convert your backend code and read up on SQLite here and there. I would imagine it will be the same for RocksDB. That said, adding backends is fortunately a quite mechanical task. It's easy to ship as an incremental release. I'm curious to find out what types of backends they would like to see and use once they build broker-enabled applications. Matthias _______________________________________________ bro-dev mailing list bro-dev at bro.org http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From jan.grashoefer at gmail.com Mon Jul 25 04:53:16 2016 From: jan.grashoefer at gmail.com (=?UTF-8?Q?Jan_Grash=c3=b6fer?=) Date: Mon, 25 Jul 2016 13:53:16 +0200 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: <082f6614-1463-562d-dc29-13c0e675d397@gmail.com> Amazing work! I really like the package manager and I am looking forward to contributing a script. > * Add a way for package?s to define ?discoverability metadata?. > > E.g. following the original plan for this would involve putting something like a ?tags? field in each package?s pkg.meta file, but the problem with this is the client would need to either download every package to be able to search this data or have a third-party periodically aggregate it. I think this is a question about who should deal with the extra effort: On the one hand requiring to spread and sync information between two places introduces a burden for the contributors, on the other hand (automatic) aggregation of information makes it harder to maintain a source including metadata. I am in favor of putting that information into pkg.meta to make contributing as easy as possible. One note: I think the documentation should contain a tremendous warning pointing out that the users are responsible for what they are installing. One scenario that came instantly to my mind: Someone is contributing a small and useful script, waits for its distribution and than updates his repository, adding e.g. a malicious build command. In that context it would be nice if the package manager would ask the user before executing the build command. For the official repository also some automatic checks would be nice (e.g. indicating in case a script executes shell commands). I think that was discussed before. All in all I think the package manager design is intuitive and really easy to use. Having central repositories will be great! Thanks, Jan From vallentin at icir.org Mon Jul 25 08:18:01 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Mon, 25 Jul 2016 08:18:01 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: <20160725151801.GB89503@shogun.local> > The package manager client is at a point now where I think it would be usable. Cool! > * Add a way for package?s to define ?discoverability metadata?. > > E.g. following the original plan for this would involve putting > something like a ?tags? field in each package?s pkg.meta file, but the > problem with this is the client would need to either download every > package to be able to search this data or have a third-party > periodically aggregate it. What does "downloading" a package mean? If the package is in the .gitmodules of the repo bro/packages, won't it be automatically downloaded once the user updates their submodules? > I put it at lower priority since I don?t think it will be common right > off the bat to have complex package dependencies and users can always > manually resolve dependencies at the moment. Agreed on inter-package dependencies. How about specifying a specific Bro version as "dependency"? > * Documentation is hosted on GitHub at the moment, move to bro.org? A key benefit of hosting it at github is reliability and that clients get good viewing performance thanks to their CDN. > The current doc/www setup feels like it?s getting rather > large/monolithic and maybe that contributes to the difficulty of > approaching/understanding it when there?s breakages. Just an idea. Keeping it separate could be an advantage for users, because the current documentation is a bit unwieldy and confusing. Since you've written it in RST, have you thought about publishing it via read-the-docs? Their documentation really reads very smoothly out of the box. CAF, for example, recently switched to it [1]. Some minor feedback: - Is the "refresh" command essentially what "update" is to Homebrew? The documentation says: Update local package source clones to retrieve information about new packages that are available. Also fetches updated package information about any installed packages to determine if new versions are available. It sounds like this means it's doing submodule update. - The documentation of the "list" command says: Filters available/installed packages by a chosen category and then outputs that filtered package list. I don't understand what "available" means here. It could also mean "packages that exist remotely but not installed locally" as opposed to "available for use right now." To avoid ambiguity and clearly distinguish it from "search", I would make that clear in the documention. - Regaring pkg.meta: this is more of a nit/style thing, but I like minimalistic naming of configuration options, e.g.: [package] version = 1.0.0 scripts = /path/to/scripts plugins = /path/to/plugins I find them easier to remember. But Robin would probably disagree with me here :-). Looking forward to see it shaping up! Matthias [1] http://actor-framework.readthedocs.io/en/latest/ From vallentin at icir.org Mon Jul 25 08:25:07 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Mon, 25 Jul 2016 08:25:07 -0700 Subject: [Bro-Dev] Broker data store API In-Reply-To: References: <20160722192610.GL91838@samurai.ICIR.org> <20160724184305.GN38625@ninja.local> Message-ID: <20160725152507.GC89503@shogun.local> > I can't speak to whether or not it is 'needed', but I have had desire > to use it in the past... The only thing preventing me from doing it > was the fact that Broker is currently a fast moving target. Good to know. Scott Campbell also uses the current version of Broker in his projects and mentioned the need for a scalable and performing storage backend. > Generally speaking, I was wanting to do it so that I could save state > between cluster restarts (specifically for authentication data). How many keys to you anticipate in your data store? And what's the rate of updates? Any ballpark estimate would be useful here. Given the interest in a scalable backend, I will bring back support for a RocksDB backend. Matthias From jsiwek at illinois.edu Mon Jul 25 10:13:58 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Mon, 25 Jul 2016 17:13:58 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <082f6614-1463-562d-dc29-13c0e675d397@gmail.com> References: <082f6614-1463-562d-dc29-13c0e675d397@gmail.com> Message-ID: <5C7DBAE3-FF86-4757-AACB-B88AD209C333@illinois.edu> > On Jul 25, 2016, at 6:53 AM, Jan Grash?fer wrote: > >> * Add a way for package?s to define ?discoverability metadata?. >> >> E.g. following the original plan for this would involve putting something like a ?tags? field in each package?s pkg.meta file, but the problem with this is the client would need to either download every package to be able to search this data or have a third-party periodically aggregate it. > > I think this is a question about who should deal with the extra effort: > On the one hand requiring to spread and sync information between two > places introduces a burden for the contributors The idea was not for contributors to have to keep syncing the information between two places, the ?discoverability? metadata would just be located within the ?package source? instead of the package itself. My thinking is that discoverability metadata should be more of a property of a package source than the package itself ? e.g. if a user is looking at discoverability data in a package?s pkg.meta file, it?s not that helpful because they?ve already found the package. Also, some people may initially have no intention of sharing their package, so there?s no reason to put discoverability metadata in its pkg.meta. If they later change their mind, and care enough to take the time to register it to a package source, then likely they don?t mind adding a few keywords to a new meta file as an optional part of the one-time registration process. > One note: I think the documentation should contain a tremendous warning > pointing out that the users are responsible for what they are > installing Thanks for the suggestion, I?ll do that. - Jon From hosom at battelle.org Mon Jul 25 10:41:05 2016 From: hosom at battelle.org (Hosom, Stephen M) Date: Mon, 25 Jul 2016 17:41:05 +0000 Subject: [Bro-Dev] Broker data store API In-Reply-To: <20160725152507.GC89503@shogun.local> References: <20160722192610.GL91838@samurai.ICIR.org> <20160724184305.GN38625@ninja.local> , <20160725152507.GC89503@shogun.local> Message-ID: The number of key/values would depend on the scale of the environment in the case of the authentication framework. In my last implementation... it was one record per user/host pair... which could scale into the tens of thousands of key/value pairs pretty quickly. I haven't looked at that stuff in a while since I'm eagerly awaiting your rewrite of the Broker APIs :) ________________________________________ From: Matthias Vallentin [matthias at vallentin.net] on behalf of Matthias Vallentin [vallentin at icir.org] Sent: Monday, July 25, 2016 11:25 AM To: Hosom, Stephen M Cc: Siwek, Jon; bro-dev at bro.org Subject: Re: [Bro-Dev] Broker data store API > I can't speak to whether or not it is 'needed', but I have had desire > to use it in the past... The only thing preventing me from doing it > was the fact that Broker is currently a fast moving target. Good to know. Scott Campbell also uses the current version of Broker in his projects and mentioned the need for a scalable and performing storage backend. > Generally speaking, I was wanting to do it so that I could save state > between cluster restarts (specifically for authentication data). How many keys to you anticipate in your data store? And what's the rate of updates? Any ballpark estimate would be useful here. Given the interest in a scalable backend, I will bring back support for a RocksDB backend. Matthias From jsiwek at illinois.edu Mon Jul 25 11:07:30 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Mon, 25 Jul 2016 18:07:30 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160725151801.GB89503@shogun.local> References: <20160725151801.GB89503@shogun.local> Message-ID: > On Jul 25, 2016, at 10:18 AM, Matthias Vallentin wrote: > >> * Add a way for package?s to define ?discoverability metadata?. >> >> E.g. following the original plan for this would involve putting >> something like a ?tags? field in each package?s pkg.meta file, but the >> problem with this is the client would need to either download every >> package to be able to search this data or have a third-party >> periodically aggregate it. > > What does "downloading" a package mean? If the package is in the > .gitmodules of the repo bro/packages, won't it be automatically > downloaded once the user updates their submodules? Right now, packages don?t get downloaded via the submodule, they are cloned directly from the package?s full git URL (which git just happens to encoded within the submodule). So this means only packages a user is interested in end up getting downloaded. I think it also helps in cases where a user installs a package and later it gets removed from the package source ? so the submodule is gone, but user?s installed version is not effected because it?s cloned directly from the package?s git URL. i.e. the package manager doesn?t distinguish between packages installed from a package source and packages installed directly from git URL. If we wanted, we could actually use something completely different from git submodules to register a package to a package source. The package source just has to have some sort of database that links nodes in a package hierarchy (e.g. alice/foo, bob/bar, eve/baz) to their actual URLs. Git submodules just happens to perform this role. Maybe another added benefit of submodules is that if someone (e.g. a web frontend) does want to download the ?universe of packages? (maybe to do some global stats/analysis on it) its easy to do that with a single builtin git command. > Agreed on inter-package dependencies. How about specifying a specific > Bro version as "dependency?? Yep, that?s on also on the TODO list. > have you thought about publishing it via read-the-docs? Yeah, just haven?t looked into it. I?ll do that unless consensus is to host the docs on bro.org. > Some minor feedback: > > - Is the "refresh" command essentially what "update" is to Homebrew? The > documentation says: > > Update local package source clones to retrieve information about new > packages that are available. Also fetches updated package > information about any installed packages to determine if new > versions are available. > > It sounds like this means it's doing submodule update. I?ll try to clarify it in the docs. It doesn?t do a recursive submodule update, it just updates the source repo itself (so submodule additions/removals are visible, but nothing further is actually downloaded). > - The documentation of the "list" command says: > > Filters available/installed packages by a chosen category and then > outputs that filtered package list. > > I don't understand what "available" means here. It could also mean > "packages that exist remotely but not installed locally" as opposed to > "available for use right now.? It means the former ? ?list? operates on the combined set of installed and not-yet-installed packages. Does wording it like ?Filters known packages...? make it clearer for you? > - Regaring pkg.meta: this is more of a nit/style thing, but I like > minimalistic naming of configuration options, e.g.: > > [package] > version = 1.0.0 > scripts = /path/to/scripts > plugins = /path/to/plugins Open to changing it, but seeing ?scripts? as an option, without reading any further documentation, implies to me that you might be able to specify a list of paths/files there, which you can?t. - Jon From jazoff at illinois.edu Mon Jul 25 13:49:27 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Mon, 25 Jul 2016 20:49:27 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: > On Jul 24, 2016, at 3:45 PM, Siwek, Jon wrote: > > * Add a way for package?s to define ?discoverability metadata?. Kind of related to this, I think we need to define some basic rules for package naming. This can help discoverability and also namespacing issues. Right now we have plugins named: af_packet elasticsearch kafka myricom netmap pf_ring redis tcprs But I think they need to be renamed using prefixes like: af_packet - pktsrc-af_packet elasticsearch - log-writer-elasticsearch kafka - log-writer-kafka myricom - pktsrc-myricom netmap - pktsrc-netmap pf_ring - pktsrc-pf_ring redis - log-writer-redis tcprs - analyzer-tcprs In one aspect the pktsrc- prefix acts like a tag, but can also help disambiguate plugins... i.e., a redis log writer plugin vs. a redis data store plugin vs. a redis protocol analyzer. -- - Justin Azoff From vallentin at icir.org Mon Jul 25 20:31:17 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Mon, 25 Jul 2016 20:31:17 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: References: <20160725151801.GB89503@shogun.local> Message-ID: <20160726033117.GP38625@ninja.local> > Right now, packages don?t get downloaded via the submodule, they are > cloned directly from the package?s full git URL (which git just > happens to encoded within the submodule). > > So this means only packages a user is interested in end up getting > downloaded. I'm not 100% following. Isn't every package recorded as submodule? Is there any use case where you would do a submodule update? Or are the packages just recorded there instead of recording them in a separate file? > The package source just has to have some sort of database that links > nodes in a package hierarchy (e.g. alice/foo, bob/bar, eve/baz) to > their actual URLs. Git submodules just happens to perform this role. (Yeah, reusing this makes sense) > > Filters available/installed packages by a chosen category and then > > outputs that filtered package list. > > > > I don't understand what "available" means here. It could also mean > > "packages that exist remotely but not installed locally" as opposed to > > "available for use right now.? > > It means the former ? ?list? operates on the combined set of installed and not-yet-installed packages. > > Does wording it like ?Filters known packages...? make it clearer for you? I think "known" is also ambiguous, because it doesn't clearly convey the local aspect. How about just saying "filters installed packages"? > [..] but seeing ?scripts? as an option, without reading any further > documentation, implies to me that you might be able to specify a list > of paths/files there, which you can?t. Fair point. The reduction certainly omits some semantics. To simplify reading the options, maybe add an underscore, e.g., script_path and plugin_path? Matthias From jsiwek at illinois.edu Tue Jul 26 09:13:23 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Tue, 26 Jul 2016 16:13:23 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160726033117.GP38625@ninja.local> References: <20160725151801.GB89503@shogun.local> <20160726033117.GP38625@ninja.local> Message-ID: <609AE252-253A-4014-80F2-A5D2351DE344@illinois.edu> > On Jul 25, 2016, at 10:31 PM, Matthias Vallentin wrote: > >> Right now, packages don?t get downloaded via the submodule, they are >> cloned directly from the package?s full git URL (which git just >> happens to encoded within the submodule). >> >> So this means only packages a user is interested in end up getting >> downloaded. > > I'm not 100% following. Isn't every package recorded as submodule? Every package within a package source is recorded as a git submodule and that recording happens at the time the package author registers their package with a source. The bro-pkg client itself makes no changes to submodules. > Is there any use case where you would do a submodule update? Depends on who ?you? refers to: - a regular bro-pkg user: no, they don?t need to be aware that submodules are used - a package author: no, they only care that submodules are used when they do the one-time registration process to add their package to a source - the bro-pkg developer/maintainer: not currently, but that?s maybe an implementation detail. I don?t currently ever update submodules and instead clone packages directly via their full git URL to a separate location because I think that?s the more robust implementation. - some other entity that does periodic analysis on all packages (e.g. web frontend): I?d probably expect them to not be using bro-pkg at all, but they clone a package source and do recursive submodule updates on it as the easiest way of downloading the latest versions of everything. > Or are the > packages just recorded there instead of recording them in a separate file? Right, using git submodules isn?t a requirement for the bro-pkg client to work, we could make up a different file/format for registering packages. But maybe submodules do provide some convenience for the last use case mentioned above. > I think "known" is also ambiguous, because it doesn't clearly convey > the local aspect. How about just saying "filters installed packages?? Not all subcategories of ?list? are working with just the locally ?installed? packages. E.g. ?list all? is looking at both installed packages (local git repos) and not-installed packages (remote git repos, but we know about them because they are registered with a source). How about this description: ?The ?list? command outputs a list of packages that match a given category? > maybe add an underscore, e.g., script_path and plugin_path? Yeah, can do that. And maybe ?dir? is more meaningful than ?path? since the later may mean file or directory? - Jon From vallentin at icir.org Tue Jul 26 21:18:44 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Tue, 26 Jul 2016 21:18:44 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: <609AE252-253A-4014-80F2-A5D2351DE344@illinois.edu> References: <20160725151801.GB89503@shogun.local> <20160726033117.GP38625@ninja.local> <609AE252-253A-4014-80F2-A5D2351DE344@illinois.edu> Message-ID: <20160727041844.GB21315@shogun.local> > > I'm not 100% following. Isn't every package recorded as submodule? > > Every package within a package source is recorded as a git submodule > and that recording happens at the time the package author registers > their package with a source. The bro-pkg client itself makes no > changes to submodules. Got it, thanks! Also, this page of the manual really helped me fill in the missing pieces: https://bro.github.io/package-manager/source.html > [..] How about this description: > > ?The ?list? command outputs a list of packages that match a given > category? Yep, my favorite so far! > > maybe add an underscore, e.g., script_path and plugin_path? > > Yeah, can do that. And maybe ?dir? is more meaningful than ?path? > since the later may mean file or directory? Also agreeing here. Matthias From seth at icir.org Wed Jul 27 08:44:02 2016 From: seth at icir.org (Seth Hall) Date: Wed, 27 Jul 2016 11:44:02 -0400 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: > On Jul 25, 2016, at 4:49 PM, Azoff, Justin S wrote: > > In one aspect the pktsrc- prefix acts like a tag, but can also help disambiguate plugins... i.e., a redis log writer plugin vs. a redis data store plugin vs. a redis protocol analyzer. I actually don't like this that much because some of these can cross boundaries and do all sorts of different things in a single plugin. It makes more sense to me to leave the naming open. If people want to name a plugin with a prefix, they're free to, but I wouldn't want to discourage people from maintaining individual plugins that provide a variety of features. .Seth -- Seth Hall International Computer Science Institute (Bro) because everyone has a network http://www.bro.org/ From jazoff at illinois.edu Wed Jul 27 08:57:21 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Wed, 27 Jul 2016 15:57:21 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: > On Jul 27, 2016, at 11:44 AM, Seth Hall wrote: > > >> On Jul 25, 2016, at 4:49 PM, Azoff, Justin S wrote: >> >> In one aspect the pktsrc- prefix acts like a tag, but can also help disambiguate plugins... i.e., a redis log writer plugin vs. a redis data store plugin vs. a redis protocol analyzer. > > I actually don't like this that much because some of these can cross boundaries and do all sorts of different things in a single plugin. It makes more sense to me to leave the naming open. If people want to name a plugin with a prefix, they're free to, but I wouldn't want to discourage people from maintaining individual plugins that provide a variety of features. > > .Seth We really need to do this though, the end result otherwise will be chaos. Package names shouldn't have a generic name just because it was the first one in the repository. Leaving it open will lead to: The first person that writes a redis plugin for log writing calls it 'redis'. Then a redis analyzer is called 'redis-analyzer' Then someone writes a redis input source and that gets called 'input-source-redis' Then a postgres analyzer is written and named 'postgresql'. Then a postgres log writer plugin is named 'postgresql-log-writer'. Then an input source is written named 'postgresql-input-source'. So a year later we end up with packages named: redis redis-analyzer input-source-redis postgresql postgresql-log-writer postgresql-input-source Where 'redis' is a log writer plugin and 'postgresql' is an analyzer. Where the input source plugins are interchangeably named input-source-redis and postgresql-input-source. If someone wanted to write a redis plugin that was both an input source, an analyzer, and a log writer, that could be called 'redis'... letting anything else be called 'redis' is confusing and misleading. -- - Justin Azoff From vallentin at icir.org Wed Jul 27 09:05:19 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Wed, 27 Jul 2016 09:05:19 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: <20160727160519.GB49032@ninja.local> > I actually don't like this that much because some of these can cross > boundaries and do all sorts of different things in a single plugin. > It makes more sense to me to leave the naming open. I'm with Seth on this one. The reason why I think we should keep the naming open is that it's the job of the meta data tags to take care of the grouping. If someone writes a redis package, then they should apply the redis package. Encoding this meta data into the package name is quite limited, however. Matthias From johanna at icir.org Wed Jul 27 09:15:27 2016 From: johanna at icir.org (Johanna Amann) Date: Wed, 27 Jul 2016 09:15:27 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160727160519.GB49032@ninja.local> References: <20160727160519.GB49032@ninja.local> Message-ID: <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> And to add a me three to this - I am also with him on this one. On top of things - I might misremember this, but didn't we plan package names to include the github user name at one point of time? So a package name would be user/redis, for example, and there also could be user2/redis? Johanna On 27 Jul 2016, at 9:05, Matthias Vallentin wrote: >> I actually don't like this that much because some of these can cross >> boundaries and do all sorts of different things in a single plugin. >> It makes more sense to me to leave the naming open. > > I'm with Seth on this one. The reason why I think we should keep the > naming open is that it's the job of the meta data tags to take care of > the grouping. If someone writes a redis package, then they should > apply > the redis package. Encoding this meta data into the package name is > quite limited, however. > > Matthias > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From jsiwek at illinois.edu Wed Jul 27 10:50:18 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Wed, 27 Jul 2016 17:50:18 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> References: <20160727160519.GB49032@ninja.local> <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> Message-ID: <66610289-C633-4C65-A97B-336BA4F61290@illinois.edu> > On Jul 27, 2016, at 11:15 AM, Johanna Amann wrote: > > And to add a me three to this - I am also with him on this one. On top > of things - I might misremember this, but didn't we plan package names > to include the github user name at one point of time? So a package name > would be user/redis, for example, and there also could be user2/redis? Yes, package sources support hierarchical package names, but don?t require it. The hierarchy for the default package source is currently ?github_user_name/package_name?. I?m the only one w/ a package at the moment, but you can see the structure here: https://github.com/bro/packages Right now, a user of bro-pkg can refer to my package as simply ?bro-test-package?. If another user, say ?bob?, creates ?bob/bro-test-package?, then the client will no longer accept ?bro-test-package? for commands where it is ambiguous and tell the user to clarify between either ?bob/bro-test-package? or ?jsiwek/bro-test-package?. It?s also not allowed to have two packages with the same shortened name (e.g. ?bro-test-package?) installed simultaneously. Interested to hear if people have use cases for that, but I expect the common case for same-name packages to be forks (either hard forks or just forking to contribute bugfixes) and allowing multiple packages of the same name to be installed may make that case more confusing/complicated for users/developers. - Jon From robin at icir.org Wed Jul 27 10:57:10 2016 From: robin at icir.org (Robin Sommer) Date: Wed, 27 Jul 2016 10:57:10 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> References: <20160727160519.GB49032@ninja.local> <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> Message-ID: <20160727175710.GD4480@icir.org> Make it four. :) I'm with Seth, too, better not to enforce any naming scheme because the boundaries are unclear. Also, note that a single binary Bro plugin can provide multiple quite different things (say, a reader and an analyzer and a packet source all at the same time, if one so desires :). Also agree with Johanna: the username is part of the package name if I follow correctly, so there's disambiguation there. I have some more feeback on the package manager and Jon's questions starting this thread, will send soon. Robin On Wed, Jul 27, 2016 at 09:15 -0700, you wrote: > And to add a me three to this - I am also with him on this one. On top > of things - I might misremember this, but didn't we plan package names > to include the github user name at one point of time? So a package name > would be user/redis, for example, and there also could be user2/redis? > > Johanna > > On 27 Jul 2016, at 9:05, Matthias Vallentin wrote: > > >> I actually don't like this that much because some of these can cross > >> boundaries and do all sorts of different things in a single plugin. > >> It makes more sense to me to leave the naming open. > > > > I'm with Seth on this one. The reason why I think we should keep the > > naming open is that it's the job of the meta data tags to take care of > > the grouping. If someone writes a redis package, then they should > > apply > > the redis package. Encoding this meta data into the package name is > > quite limited, however. > > > > Matthias > > _______________________________________________ > > bro-dev mailing list > > bro-dev at bro.org > > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev > -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From robin at icir.org Wed Jul 27 10:59:42 2016 From: robin at icir.org (Robin Sommer) Date: Wed, 27 Jul 2016 10:59:42 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: <66610289-C633-4C65-A97B-336BA4F61290@illinois.edu> References: <20160727160519.GB49032@ninja.local> <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> <66610289-C633-4C65-A97B-336BA4F61290@illinois.edu> Message-ID: <20160727175942.GE4480@icir.org> On Wed, Jul 27, 2016 at 17:50 +0000, you wrote: > Right now, a user of bro-pkg can refer to my package as simply > ?bro-test-package?. If another user, say ?bob?, creates > ?bob/bro-test-package?, then the client will no longer accept > ?bro-test-package? for commands where it is ambiguous and tell the > user to clarify between either ?bob/bro-test-package? or > ?jsiwek/bro-test-package?. Ah, I see. Would it be better to generally use the full path as the name, and not search for submatches, to make it consistent/unambiguous what a name refers to? Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at illinois.edu Wed Jul 27 11:34:32 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Wed, 27 Jul 2016 18:34:32 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160727175942.GE4480@icir.org> References: <20160727160519.GB49032@ninja.local> <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> <66610289-C633-4C65-A97B-336BA4F61290@illinois.edu> <20160727175942.GE4480@icir.org> Message-ID: <69085201-ECCE-4DB3-8DD8-A7FAAC43AB6B@illinois.edu> > On Jul 27, 2016, at 12:59 PM, Robin Sommer wrote: > > Ah, I see. Would it be better to generally use the full path as the > name, and not search for submatches, to make it consistent/unambiguous > what a name refers to? At least from my usage it?s been convenient to be able to use a short name. It still always accepts full path names for packages even if they?re unambiguous when shortened, and the full path is what gets displayed in package listings so it?s never inconsistent in that regard. A user is free to always type full paths and for those that like to use short names an occasional ?please clarify? may be more helpful than annoying: e.g. ?oh I didn?t realize there were two, I should look into which one is more appropriate for me to use?. - Jon From robin at icir.org Wed Jul 27 16:37:58 2016 From: robin at icir.org (Robin Sommer) Date: Wed, 27 Jul 2016 16:37:58 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: References: Message-ID: <20160727233758.GM4480@icir.org> On Sun, Jul 24, 2016 at 19:45 +0000, you wrote: > The package manager client is at a point now where I think it would be > usable. Finally got a chance to play with it a bit. Excellent work, I really like it! Belows a list of just smaller things I noticed. The only larger question I have is regading the use of submodules, also following up on other parts of this thread. In principle, I actually quite like the idea of using submodules; git already offers the mechanism, so why not build on that. That said, seeing how the package manager ends up using submodules, it's not quite the pure git model actually. If I understood it right, it's using them really only to *find* the external repos, but not to pinpoint a particular commit in there; the package source never even updates the submodules. Given that approach, I'm now wondering if a custom scheme wouldn't be the more intuitive solution. My concern is that whoever looks at this submodule usage, will take a while to understand what's actually happening. One could argue that's only an implementation detail and shouldn't really matter to anybody. On the other hand, if, for example, somebody ends up browsing the package source repository on GitHub, I'm sure they'd be confused by all the packages pointing to very old versions. So I'm wondering if it would be worth switching to custom index instead of submodules; seems that wouldn't be difficult either if indeed all we need to do is track the external URLs somehow. Also, if you want to track discoverability metadata there already as well, seems that the URL could just become part of that, no? Here's my list of other random things I noticed: - Would suggest to rename ?pkg.meta? to, say, ?bro-pkg.meta?, just to make it more explicit that it's a Bro package. That's something one can also then search for on GitHub. - Does "upgrade" show the packages affected and ask for confirmation? I would suggest either doing that or require an --all option for upgrading everything, as that's a potentially dangerous operation. - I suppose upgrading does (better: will do) dependency checking again, including making sure the Bro version matches the one that update now requires? - When installing the package manager as part of Bro, could we pull in the Python dependencies automatically, for example by installing them into the same prefix? Both GitHub and semantic_version are pretty non-standard. Using them is ok I think but it would be nice if "bro-pkg" wouldn't abort first thing because they aren't installed yet. - How about adding a note to either packages.bro or the whole packages/ directory that's it's automatically maintained and not supposed to be manually messed with? - In bro-pkg.conf, has "default" in "[sources]" a special meaning, or could it be any tag? Assuming the latter, I would just call it "bro": "bro/jsiwek/bro-test-package" is more intuitive than "default/jsiwek/bro-test-package". - For our default package source, do we want to support non-GitHub repositories? If so, a naming scheme by GitHub user won't work. - Suggest to rename "/opt/bro/var/lib/package-manager" to "../bro-package-manager" or "../bro-pkg". - Once we support dependencies on Bro versions, would be nice if that worked also with the "x.y-z" scheme that git master uses (and maybe it just does anyways). - I like the Python API! - Documentation (nice!): - Python 3.x works, right? Then I'd list that explicitly. - A quick-start guide would be helpful that just mentions the most important steps, including basic installation along with Bro itself (once that's merged). - The "Installation" section becomes a bit confusing towards with the end with all those paths. Maybe split some parts out into an advanced section or so? - How-tos would be helpful that show by example how to create a (1) a pure script package, (2) and binary Bro plugin, and (3) a BroControl plugin. Finally, my take on your questions: > My current idea is that instead of putting this type of data inside > the package?s metadata, the user puts it in the package source?s > metadata. Yeah, I like that. > Automatic inter-package dependency analysis Agree on lower priority. > * Is it acceptable to depend on GitPython and semantic_version python packages? It is, it just would be nice if we could help people a bit getting these installed, see above. > * Documentation is hosted on GitHub at the moment, move to bro.org? Keeping these docs separate makes sense, although it would also be nice to have the option to integrate into bro.org at least. For now, I think it's fine to just do your own Sphinx and publish it wherever (GitHub, RTM). We can later see what to do about bro.org. Generally, our bro.org setup certainly needs work, it's become hard to maintain and extend. We have been talking about some options here recently but not settled on anything in particular yet. > * Thoughts on when to merge ?package-manager? branch in ?bro? ? The main question is if we see this as a 2.5 feature? If so, we should merge soon; if not, postpone until that's out the door. Given how far you are already, I vote for making it part of 2.5. We could declare it experimental still for 2.5 to get some more time to iron out the workflow before we tell people it's ok to start relying on it. Again, all very nice work! Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From seth at icir.org Wed Jul 27 20:25:49 2016 From: seth at icir.org (Seth Hall) Date: Wed, 27 Jul 2016 23:25:49 -0400 Subject: [Bro-Dev] package manager progress In-Reply-To: <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> References: <20160727160519.GB49032@ninja.local> <039A75A9-720C-42FC-BC2C-93B603A1D9DE@icir.org> Message-ID: <4907B353-16BC-4D79-851A-28E5CE749D50@icir.org> > On Jul 27, 2016, at 12:15 PM, Johanna Amann wrote: > > And to add a me three to this - I am also with him on this one. On top of things - I might misremember this, but didn't we plan package names to include the github user name at one point of time? So a package name would be user/redis, for example, and there also could be user2/redis? I may have lost track of the design so I don't know where things stand now, but I think this would make sense too. .Seth -- Seth Hall International Computer Science Institute (Bro) because everyone has a network http://www.bro.org/ From vallentin at icir.org Wed Jul 27 21:22:23 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Wed, 27 Jul 2016 21:22:23 -0700 Subject: [Bro-Dev] New documentation via Sphinx Message-ID: <20160728042223.GC30817@shogun.local> I'm in the process of documenting Broker with Sphinx. With minimal effort, I put up a scaffold that looks like this: http://bro.github.io/broker/ It's the bootstrap theme for sphinx, as an alternative to the classic read-the-docs theme. I've hacked the sidebar so that it shows the table of contents. The nice thing about this setup is that it doesn't require any server-side support. I just type `make doc` locally and can open the HTML pages. I pushed the above to broker's gh-pages branch so that you can view it under the above github.io URL. Search is also implemented via JS and works great. Sphinx also has a plugin to also generate a PDF Version of the manual, which I've put here: http://docdro.id/rHNvn1X. Don't look too much at the content, I'm just getting started. But the whole setup looks really simple and could be a good starting point for the next Bro documentation overhaul. Matthias From vallentin at icir.org Thu Jul 28 07:40:13 2016 From: vallentin at icir.org (Matthias Vallentin) Date: Thu, 28 Jul 2016 07:40:13 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160727233758.GM4480@icir.org> References: <20160727233758.GM4480@icir.org> Message-ID: <20160728144013.GF49032@ninja.local> > - Would suggest to rename ?pkg.meta? to, say, ?bro-pkg.meta?, just to > make it more explicit that it's a Bro package. That's something one > can also then search for on GitHub. Just throwing in two more permutations: bro.meta or bro.pkg. > - For our default package source, do we want to support non-GitHub > repositories? If so, a naming scheme by GitHub user won't work. > > - Suggest to rename "/opt/bro/var/lib/package-manager" to > "../bro-package-manager" or "../bro-pkg". Yeah, especially if users don't install into the /opt/bro prefix, not having "bro" as part of the filename might be confusing. > For now, I think it's fine to just do your own Sphinx and publish it > wherever (GitHub, RTM). In case you're going with github, here's something non-intuitive that took me a while to figure out: you need to put an (empty) file .nojekyll in the document root, otherwise github interprets directories starting with underscores, which sphinx uses (e.g., _static and _images). > Given how far you are already, I vote for making it part of 2.5. +1 Matthias From jsiwek at illinois.edu Thu Jul 28 10:52:50 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Thu, 28 Jul 2016 17:52:50 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160727233758.GM4480@icir.org> References: <20160727233758.GM4480@icir.org> Message-ID: <7770E12D-ECD4-4F10-8020-69EDCD15C68A@illinois.edu> > On Jul 27, 2016, at 6:37 PM, Robin Sommer wrote: > > On the other hand, if, for example, somebody ends up > browsing the package source repository on GitHub, I'm sure they'd be > confused by all the packages pointing to very old versions. Yeah, agree that is confusing and a problem of using submodules without ever updating them in the source repo. > So I'm > wondering if it would be worth switching to custom index instead of > submodules; seems that wouldn't be difficult either if indeed all we > need to do is track the external URLs somehow. I?m on board with switching to a custom index format. > Also, if you want to > track discoverability metadata there already as well, seems that the > URL could just become part of that, no? Yes. Any preference on the new index format? Single index file? Multiple files? INI, JSON, something else? I think still allowing package sources to be structured in a directory hierarchy is more intuitive to navigate and maybe less intimidating to modify than a single file as the source grows over time. And I?m already using INI format in 2 other places, so seems fine to apply here, too. So a proposed structure of a package source: https://github.com/bro/packages alice/ bro-pkg.index bob/ bro-pkg.index ? alice?s bro-pkg.index: [foo] url = https://github.com/alice/foo keywords = Mr.T pity [bar] url = https://github.com/alice/bar keywords = club pub drinks bob?s bro-pkg.index: [baz] url = https://github.com/bob/baz keywords = lightning storm Output of `bro-pkg list all`: bro/alice/foo bro/alice/bar bro/bob/baz > - Would suggest to rename ?pkg.meta? to, say, ?bro-pkg.meta? Sure. > - Does "upgrade" show the packages affected and ask for confirmation? > I would suggest either doing that or require an --all option for > upgrading everything, as that's a potentially dangerous operation. It doesn?t ask for confirmation, but in favor requiring the explicit --all. > - I suppose upgrading does (better: will do) dependency checking > again, including making sure the Bro version matches the one that > update now requires? Yes, I imagine the dependency analysis for upgrading and installing being the same or similar process. > - When installing the package manager as part of Bro, could we pull in > the Python dependencies automatically, for example by installing > them into the same prefix? Yes, I can likely get that to work. > Both GitHub and semantic_version are > pretty non-standard. Using them is ok I think but it would be nice > if "bro-pkg" wouldn't abort first thing because they aren't > installed yet. Alternatively, I can have CMake detect whether they are installed, then, if not, don?t install bro-pkg and put a warning/explanation in the CMake summary output. Let me know which is preferred. I?m a bit in favor of auto-installing the python dependencies into Bro?s install prefix. > - How about adding a note to either packages.bro or the whole > packages/ directory that's it's automatically maintained and not > supposed to be manually messed with? Ok. > - In bro-pkg.conf, has "default" in "[sources]" a special meaning, or > could it be any tag? Assuming the latter, I would just call it > "bro" It?s arbitrary, will change ?default' to ?bro?. > - For our default package source, do we want to support non-GitHub > repositories? If so, a naming scheme by GitHub user won't work. The hierarchy isn?t strictly required to use GitHub usernames. Generally could be "$author_name/$package_name?, where the most common case is for $author to be a GitHub user name. A domain name, company/organization name, or any string to help identify the author would work. > - Suggest to rename "/opt/bro/var/lib/package-manager" to > "../bro-package-manager" or "../bro-pkg?. Agree about changing ?package-manager? to ?bro-package-manager?, but do you also mean to get rid of the ?lib? subdir? I think that fits within Filesystem Hierarchy Standard [1]. For /var/lib that says: "State information. Persistent data modified by programs as they run, e.g., databases, packaging system metadata, etc.? There?s probably nuances that let you get away with other locations when installing to prefixes other than ?/?, but I find it generally works well to just replace ?/? with user?s preferred install prefix. Let me know what you think. > - Once we support dependencies on Bro versions, would be nice if that > worked also with the "x.y-z" scheme that git master uses (and maybe > it just does anyways). Should already work via semantic_version. > - Python 3.x works, right? Then I'd list that explicitly. Worked for me. Will do. > - A quick-start guide would be helpful that just mentions the most > important steps, including basic installation along with Bro > itself (once that's merged). Tried to do this in the Overview/README?s ?Installation? section. I think reorganizing that in smaller sections with bullet points to follow and re-labeling it as ?quick-start guide? may help. > - The "Installation" section becomes a bit confusing towards with > the end with all those paths. Maybe split some parts out into an > advanced section or so? Yeah, will try to re-organize. > - How-tos would be helpful that show by example how to create a > (1) a pure script package, (2) and binary Bro plugin, and (3) a > BroControl plugin. Sure, I?ll add explicit step-by-step guides for each. - Jon [1] https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard From jsiwek at illinois.edu Thu Jul 28 13:11:40 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Thu, 28 Jul 2016 20:11:40 +0000 Subject: [Bro-Dev] New documentation via Sphinx In-Reply-To: <20160728042223.GC30817@shogun.local> References: <20160728042223.GC30817@shogun.local> Message-ID: > On Jul 27, 2016, at 11:22 PM, Matthias Vallentin wrote: > > http://bro.github.io/broker/ > > It's the bootstrap theme for sphinx, as an alternative to the classic > read-the-docs theme. I've hacked the sidebar so that it shows the table > of contents. I like that theme/layout. - The prominent and fully expanded sidebar works well and is more helpful than having to expand them one-by-one when first digging around. - Having Sphinx auto-add the section numbering makes it easier to understand the doc structure and relationship between sections. I?ll probably at least use those two design elements in the package manager docs, but maybe worth experimenting w/ other Bro docs to see if it helps there to. > whole setup looks really simple and could be a good starting point for > the next Bro documentation overhaul. I?d also say it could pay off to explore whether it?s better to have each Bro component/submodule be capable of building its own self-contained docs via similar/common Sphinx configurations. But I?m maybe forgetting a bunch of stuff about how the docs all get glued together when built on bro.org that would make it tricky. - Jon From robin at icir.org Thu Jul 28 13:37:50 2016 From: robin at icir.org (Robin Sommer) Date: Thu, 28 Jul 2016 13:37:50 -0700 Subject: [Bro-Dev] package manager progress In-Reply-To: <7770E12D-ECD4-4F10-8020-69EDCD15C68A@illinois.edu> References: <20160727233758.GM4480@icir.org> <7770E12D-ECD4-4F10-8020-69EDCD15C68A@illinois.edu> Message-ID: <20160728203750.GJ70102@icir.org> On Thu, Jul 28, 2016 at 17:52 +0000, you wrote: > I think still allowing package sources to be structured in a directory > hierarchy is more intuitive to navigate and maybe less intimidating to > modify than a single file as the source grows over time. And I?m > already using INI format in 2 other places, so seems fine to apply > here, too. Yep, agree with both. That also makes merging pull requests easy / contained. > So a proposed structure of a package source: Looks good to me. > the CMake summary output. Let me know which is preferred. I?m a bit > in favor of auto-installing the python dependencies into Bro?s install > prefix. I also prefer auto-installation, unless there's a reasonable risk that it could interfere with already installed versions of those packets, not sure? > The hierarchy isn?t strictly required to use GitHub usernames. > Generally could be "$author_name/$package_name?, where the most common > case is for $author to be a GitHub user name. A domain name, > company/organization name, or any string to help identify the author > would work. Ok, we probably need to write down our policy somewhere what we do/expect for the default source. > Agree about changing ?package-manager? to ?bro-package-manager?, but > do you also mean to get rid of the ?lib? subdir? No, I didn't, sorry for the confusion. I was just too lazy to type the full path again, should have inserted 3 dots to make that clear. > Tried to do this in the Overview/README?s ?Installation? section. I > think reorganizing that in smaller sections with bullet points to > follow and re-labeling it as ?quick-start guide? may help. Ack. Robin -- Robin Sommer * ICSI/LBNL * robin at icir.org * www.icir.org/robin From jsiwek at illinois.edu Thu Jul 28 14:23:56 2016 From: jsiwek at illinois.edu (Siwek, Jon) Date: Thu, 28 Jul 2016 21:23:56 +0000 Subject: [Bro-Dev] package manager progress In-Reply-To: <20160728203750.GJ70102@icir.org> References: <20160727233758.GM4480@icir.org> <7770E12D-ECD4-4F10-8020-69EDCD15C68A@illinois.edu> <20160728203750.GJ70102@icir.org> Message-ID: > On Jul 28, 2016, at 3:37 PM, Robin Sommer wrote: > >> in favor of auto-installing the python dependencies into Bro?s install >> prefix. > > I also prefer auto-installation, unless there's a reasonable risk that > it could interfere with already installed versions of those packets, > not sure? Don?t think so. > Ok, we probably need to write down our policy somewhere what we > do/expect for the default source. Expanding the README of https://github.com/bro/packages to include notes on the submission process and naming convention/policy seems like the place to me. - Jon From asharma at lbl.gov Fri Jul 29 14:48:59 2016 From: asharma at lbl.gov (Aashish Sharma) Date: Fri, 29 Jul 2016 14:48:59 -0700 Subject: [Bro-Dev] testing topic/dnthayer/ticket1627 Message-ID: <20160729214857.GI83398@mac-4.local> HI Daniel, Are there any specific node.cfg settings or broctl.cfg settings to run the Logging node ? Could you please point me to the right locations. Thanks, Aashish From jazoff at illinois.edu Fri Jul 29 14:51:04 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Fri, 29 Jul 2016 21:51:04 +0000 Subject: [Bro-Dev] testing topic/dnthayer/ticket1627 In-Reply-To: <20160729214857.GI83398@mac-4.local> References: <20160729214857.GI83398@mac-4.local> Message-ID: <76E6EAA0-1FB8-49CC-AA33-FE545B27B4B6@illinois.edu> You simply need to copy/paste your manager section in node.cfg and change manager to logger, so you should end up with something like this: [manager] type=manager host=1.2.3.4 [logger] type=logger host=1.2.3.4 -- - Justin Azoff > On Jul 29, 2016, at 5:48 PM, Aashish Sharma wrote: > > HI Daniel, > > Are there any specific node.cfg settings or broctl.cfg settings to run the Logging node ? Could you please point me to the right locations. > > Thanks, > Aashish > > _______________________________________________ > bro-dev mailing list > bro-dev at bro.org > http://mailman.icsi.berkeley.edu/mailman/listinfo/bro-dev From jazoff at illinois.edu Fri Jul 29 16:40:42 2016 From: jazoff at illinois.edu (Azoff, Justin S) Date: Fri, 29 Jul 2016 23:40:42 +0000 Subject: [Bro-Dev] Making scan.bro great again. Message-ID: I took a closer look at scan-NG and at the scan.bro that shipped with 1.5 to understand how the detection could be better than what we have now. 1.5 wasn't fundamentally better, but compared to what we are doing now it has an unfair advantage :-) I found that it used tables like this: global distinct_ports: table[addr] of set[port] &read_expire = 15 mins &expire_func=port_summary &redef; Not only is it using a default timeout of 15 minutes vs 5 minutes, it is using read_expire. This means that an attacker can send one packet every 14 minutes 25 times and still be tracked. Meaning scan.bro as shipped with 1.5 can pick up slow scans over as much as a 6 hour period. The sumstats based scan.bro can only detect scans that fit in the fixed time window (it is effectively using create_expire, but as Aashish points out, limited even further since the 'creation time' is a fixed interval regardless of when the attacker is first seen) The tracking that 1.5 scan.bro has isn't doing anything inherently better than what we have now, it's just doing it over a much longer period of time. The actual detection it uses has the same limitations the current sumstats based scan.bro has: it does not detect fully randomized port scans. It would benefit from the same "unification" changes. Since that fixing sumstats and adding new functionality to solve this problem in a generic way is a huge undertaking, I tried instead to just have scan.bro do everything itself. We may not be able to easily fix sumstats, but I think we can easily fix scan.bro by making it not use sumstats. To see if this was even viable or a waste of time I wrote the script: it works. It sends new scan attempts to the manager and stores them in a similar '&read_expire = 15 mins' table. This should detect everything that the 1.5 based version did, plus all the fully random scans that were previously missed. And with the simpler unified data structure and capped set sizes it will use almost zero resources. Attached is the code I just threw on our dev cluster. It's the implementation of "What is the absolute simplest thing that could possibly work". It uses 1 event and 2 tables, one for the workers and one for the manager. What does this look like from a CPU standpoint? [cid:9351b696-d4ba-4a9c-820f-8c156f24e3f5 at mx.uillinois.edu] This graph shows a number of experiments. * The first block around 70% is the unified sumstats based scan.bro plus hacked up sumstats/cluster.bro to do data transfer more efficiently * The next block at 40% was the unified scan.bro hacked up to make the manager do all the sumstats (worked, but had issues) * The small spike upwards back to 70% was a return to the unified scan.bro that is in git with the threshold changed back to 25 * The spike up to 170-200% was a return to stock sumstats/cluster.bro. This is what 2.5 would be with sumstats based scan.bro * The drop back down to 40% is the switch to the attached scan.bro that does not use sumstats at all. The 'duration' is TODO in the notices, but otherwise everything works. I want to just get the start time directly from the time information in the table.. I'm not sure if bro exposes it or even stores it in a usable way. If there's no way to get it out of the table I just need to track when an attacker is first seen separately, but that is easy enough to do. -- - Justin Azoff -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160729/3d1af289/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: bro-scan-cpu.png Type: image/png Size: 55891 bytes Desc: bro-scan-cpu.png Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160729/3d1af289/attachment-0001.bin -------------- next part -------------- A non-text attachment was scrubbed... Name: scan.bro Type: application/octet-stream Size: 6149 bytes Desc: scan.bro Url : http://mailman.icsi.berkeley.edu/pipermail/bro-dev/attachments/20160729/3d1af289/attachment-0001.obj