From seth at corelight.com Fri Aug 16 06:13:15 2019 From: seth at corelight.com (Seth Hall) Date: Fri, 16 Aug 2019 09:13:15 -0400 Subject: [Zeek-Dev] Multiline strings Message-ID: <7B1BDCAF-E494-463A-AF8A-56296BB98BF7@corelight.com> Just wanted to point out that I was surprised this morning when I recalled for the first time in about 10 years that the Zeek parser can't handle multiline strings... event zeek_init() { print "Hello, World!"; } That code doesn't work. :) .Seth -- Seth Hall * Corelight, Inc * www.corelight.com From vlad at es.net Mon Aug 26 06:22:46 2019 From: vlad at es.net (Vlad Grigorescu) Date: Mon, 26 Aug 2019 08:22:46 -0500 Subject: [Zeek-Dev] Sending Packets via Broker Message-ID: Master has code for setting up the cluster framework with time machine nodes, and is_external_connection is a BIF that determines if a connection has been received from an external source, but in Broker, I don't see how I would send a packet into the Zeek packet processing system. Does such functionality exist? Or was it planned to be added later but still needs to be implemented? Thanks, --Vlad -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190826/73b828f8/attachment.html From jsiwek at corelight.com Mon Aug 26 09:46:50 2019 From: jsiwek at corelight.com (Jon Siwek) Date: Mon, 26 Aug 2019 09:46:50 -0700 Subject: [Zeek-Dev] Sending Packets via Broker In-Reply-To: References: Message-ID: On Mon, Aug 26, 2019 at 6:29 AM Vlad Grigorescu wrote: > > Master has code for setting up the cluster framework with time machine nodes, and is_external_connection is a BIF that determines if a connection has been received from an external source, but in Broker, I don't see how I would send a packet into the Zeek packet processing system. > > Does such functionality exist? Or was it planned to be added later but still needs to be implemented? Sending packets doesn't exist anymore with Broker. Shouldn't be any trickier to re-implement than before, but just don't think it's on any near-term roadmap. - Jon From kap4020 at gmail.com Fri Aug 30 11:10:10 2019 From: kap4020 at gmail.com (Karl Pietrzak) Date: Fri, 30 Aug 2019 14:10:10 -0400 Subject: [Zeek-Dev] changing format of uid to ULID? Message-ID: Good morning everyone. I'm researching compression of Zeek data. I'm currently dumping Zeek data into Parquet files, and one of the most challenging fields to compress is uid because of its high entropy. I'm wondering if there's any interest in changing the format of the uid to something like ULID , of which there is a C++ implementation already. A ULID-based uid implementation would: - allow uids to be sorted, which isn't helpful in-and-of-itself, but very helpful for compression - still URL-safe - always 26 characters, for simpler storage - case-insensitive Looking through the code (UID.h and UID.cc ) and its usages, it doesn't look technically difficult but I'm sure I'm missing some reasons. For example, I noticed that prefixes such as the letter 'C' are used to denote kinds of connections. Perhaps that data can be extracted to another field instead? Anyways, looking for thoughts, comments, suggestions, and anything else. Thank you! -- Karl -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190830/66bb7ef2/attachment.html From justin at corelight.com Fri Aug 30 11:21:09 2019 From: justin at corelight.com (Justin Azoff) Date: Fri, 30 Aug 2019 14:21:09 -0400 Subject: [Zeek-Dev] changing format of uid to ULID? In-Reply-To: References: Message-ID: On Fri, Aug 30, 2019 at 2:17 PM Karl Pietrzak wrote: > Good morning everyone. > > I'm researching compression of Zeek data. I'm currently dumping Zeek data > into Parquet files > I don't have much feedback on the uid bits, but I'm very interested in Parquet! I had looked into doing this a while back but the tooling around parquet was very java/big data focussed and not very CLI friendly. Are you using the new c++ implementation in a log writer or are you converting json to parquet? -- Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190830/a72cad9e/attachment.html From kap4020 at gmail.com Fri Aug 30 11:47:54 2019 From: kap4020 at gmail.com (Karl Pietrzak) Date: Fri, 30 Aug 2019 14:47:54 -0400 Subject: [Zeek-Dev] changing format of uid to ULID? In-Reply-To: References: Message-ID: I'd say the tooling is still Java-focused, but I found some decent CLI tooling at https://github.com/apache/parquet-mr/tree/master/parquet-tools Specifically, I used the convert command to go from JSON -> Parquet. JSON.gz to Parquet (gzip compression code) saved us about 35%. When you say "log writer", do you mean custom Zeek writer that writes to Parquet directly? The major issue we're facing is that the schema for Zeek output can change over time (more columns can be added). That's an issue for Parquet. On Fri, Aug 30, 2019 at 2:21 PM Justin Azoff wrote: > On Fri, Aug 30, 2019 at 2:17 PM Karl Pietrzak wrote: > >> Good morning everyone. >> >> I'm researching compression of Zeek data. I'm currently dumping Zeek >> data into Parquet files >> > > I don't have much feedback on the uid bits, but I'm very interested in > Parquet! I had looked into doing this a while back but the tooling around > parquet was very java/big data focussed and not very CLI friendly. Are you > using the new c++ implementation in a log writer or are you converting > json to parquet? > > -- > Justin > -- Karl -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190830/4747220d/attachment.html