[Zeek-Dev] changing format of uid to ULID?

Karl Pietrzak kap4020 at gmail.com
Fri Aug 30 11:47:54 PDT 2019


I'd say the tooling is still Java-focused, but I found some decent CLI
tooling at https://github.com/apache/parquet-mr/tree/master/parquet-tools

Specifically, I used the convert command
<https://github.com/apache/parquet-mr/blob/master/parquet-cli/src/main/java/org/apache/parquet/cli/commands/ConvertCommand.java>
to go from JSON -> Parquet.  JSON.gz to Parquet (gzip compression code)
saved us about 35%.

When you say "log writer", do you mean custom Zeek writer
<https://docs.zeek.org/en/stable/frameworks/logging.html> that writes to
Parquet directly?

The major issue we're facing is that the schema for Zeek output can change
over time (more columns can be added).  That's an issue for Parquet.

On Fri, Aug 30, 2019 at 2:21 PM Justin Azoff <justin at corelight.com> wrote:

> On Fri, Aug 30, 2019 at 2:17 PM Karl Pietrzak <kap4020 at gmail.com> wrote:
>
>> Good morning everyone.
>>
>> I'm researching compression of Zeek data.  I'm currently dumping Zeek
>> data into Parquet files
>>
>
> I don't have much feedback on the uid bits, but I'm very interested in
> Parquet!  I had looked into doing this a while back but the tooling around
> parquet was very java/big data focussed and not very CLI friendly.  Are you
> using the new c++ implementation in a log  writer or are you converting
> json to parquet?
>
> --
> Justin
>


-- 
Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190830/4747220d/attachment.html 


More information about the zeek-dev mailing list