[Bro-Dev] [JIRA] (BIT-1161) topic/jsiwek/faster-val-clone

Jon Siwek (JIRA) jira at bro-tracker.atlassian.net
Tue Mar 18 14:53:44 PDT 2014

Jon Siwek created BIT-1161:

             Summary: topic/jsiwek/faster-val-clone
                 Key: BIT-1161
                 URL: https://bro-tracker.atlassian.net/browse/BIT-1161
             Project: Bro Issue Tracker
          Issue Type: Improvement
          Components: Bro
    Affects Versions: git/master
            Reporter: Jon Siwek
             Fix For: 2.3

This branch makes it less expensive to serialize large/complex values (e.g. connection and/or fa_file records).

The obvious overhead that could be reduced was from the fixed growth incrementation of the buffer used to contain serialized data.  With records that expand out to ~1.6M (master) or ~3M (topic/bernhard/file-analysis-x509) in serialized form, it takes a bit too many allocations when trying to get there in growth increments of 64K.  It may also help some to use realloc instead of new/memcpy/delete each time it needs to grow.

I didn't find it helped much to increase the initial buffer size from 64K (and 90% of the things needing serialization fit in that size buffer anyway).

It could possibly help to preallocate a buffer that gets re-used across serializations instead of repeatedly allocating small buffers that will need to be resized.

I don't have a complete breakdown/view of the bytes that make up the serialized version of the large/complex records, but taking a quick look I note that the filenames from Location information of each BroObj/Val make up a third of ~1.6M (master).  And that's the full path of each file, so this all will depend on where the Bro scripts reside on the file system (i.e. put them as close to the root dir as possible and you might increase performance!).

Any other quick ideas of what can be done here?  If not, improving the serialization seems to deserve its own project (which also might be part of the new comm. library project) for later.

In the meantime, it's at least shown that avoiding situations where large/complex records are serialized can help (BIT-1139).  And that might always be a useful optimization strategy if the serialized representation of Vals is going to scale not just as a function of their value, but also w/ their type/attribute/location information.

This message was sent by Atlassian JIRA

More information about the bro-dev mailing list