[Zeek-Dev] changing format of uid to ULID?

Karl Pietrzak kap4020 at gmail.com
Fri Aug 30 11:10:10 PDT 2019


Good morning everyone.

I'm researching compression of Zeek data.  I'm currently dumping Zeek data
into Parquet files, and one of the most challenging fields to compress is
uid because of its high entropy.

I'm wondering if there's any interest in changing the format of the uid to
something like ULID <https://github.com/ulid/spec>, of which there is a C++
implementation  <https://github.com/suyash/ulid>already.

A ULID-based uid implementation would:

   - allow uids to be sorted, which isn't helpful in-and-of-itself, but
   very helpful for compression
   - still URL-safe
   - always 26 characters, for simpler storage
   - case-insensitive


Looking through the code (UID.h
<https://github.com/bro/bro/blob/master/src/UID.h> and UID.cc
<https://github.com/bro/bro/blob/master/src/UID.cc>) and its usages, it
doesn't look technically difficult but I'm sure I'm missing some reasons.
For example, I noticed that prefixes such as the letter 'C' are used to
denote kinds of connections.  Perhaps that data can be extracted to another
field instead?

Anyways, looking for thoughts, comments, suggestions, and anything else.
Thank you!

-- 
Karl
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.icsi.berkeley.edu/pipermail/zeek-dev/attachments/20190830/66bb7ef2/attachment.html 


More information about the zeek-dev mailing list