Need help understanding serializer

Robin Sommer sommer at in.tum.de
Mon Apr 26 06:46:38 PDT 2004


On Sun, Apr 25, 2004 at 19:22 -0700, Christian Kreibich wrote:

> I think I've basically understood the concepts of SerialObj, Serializer
> and SerializationFormat -- I think I would have structured things the
> same way. I'm getting lost in the details though:

First of all, there are still a few loose ends which need some
clean-up. I just haven't found time to do that yet. :-(

Most importantly, some classes are serialized slightly different
than others, because there are two types of interfaces, an old one
and a new one (the new one correctly handles shallow-copied objects,
which the old doesn't). For example, Val uses the old
interface while Conn uses the new one. You can differentiate the two
by looking for a DECLARE_SERIAL in the class definition; if it's
there it's the new interface. If the semantincs are different, the
explanations below refer to the new one. I am going to adapt the
other classes asap (and I intend to write a short doc about this
stuff).

> - When exactly and why does a class have to implement Unserialize() and
> Serialize()? What's their relationship to DoSerialize() and
> DoUnserialize()? The comments in SerialObj.h are a bit vague in that
> regard.

Unserialize()/Serialize() are (non-virtual) methods which are to be
called when you want to actually serialize/unserialize an object,
i.e. that's the "user-interface". They are only defined inside the
base class of an hierarchy[1] (e.g. Conn has as Serialize(), but
TCP_Connection doesn't).

Unserialize()/Serialize(), in turn, call the (virtual)
DoSerialize()/DoUnserialize() which need to be implemented in every
class derived from such a base class. DoSerialize()/DoUnserialize()
are supposed to call their parent's implementation first, and then
read/write their own attributes.

[1] I did not use BroObj as the base but the classes on the next
layer. So, for this discussion Val, Stmt, Expr, Conn, etc. all start
their own hierarchy.
 
>       *  in the SER_xxx constants and the factory approach in
>         IMPLEMENT_SERIAL
>       *  in the character constants 'i', 'e', 's' etc in Serializer.cc
>       *  in the MSG_xxx constants in RemoteSerializer.cc.
> 
> I think the latter are partially internal to the remote<->local
> communication and can hence mostly be ignored for understanding the
> serialization code, right?

Right.

> If you could quickly explain the difference
> between the first two that'd be great.

The types of objects that a Serializer handles are different from
the classes containing Serialize/Unserialize methods; e.g.
Serializer serializes function calls for which there is no
corresponding class. The characters indicate which kind of
"top-level" serialization follows, while the SER_* constants specify
a concrete class.

Perhaps this could be avoided somehow, but it would introduce more
dependencies between the Serializer and Serialize/Unserialize
methods. And the additional overhead is only small.

> Are these special in a way to have them implemented this way? Couldn't
> there be a "received" callback per SER_xxx constant that resides as
> a static method in the serializable classes themselves? So we can avoid
> hardcoding anything?

Putting them into the serializable classes themselves doesn't work
as it depends on the serializer what we need to do (e.g. the
RemoteSerializer treats a received ID different than the
PersistenceSerializer). 

It could be an alternative to use only one of the
Serialize()/Unserialize()/Got() methods which would handle all
cases. But I don't think that would be much nicer: first, each
serializer would use some switch-construct anyway, and second, we
would lose the static type checking.

> - Following the comments in SerialObj.h, I see what I need to do to make
> a class's objects serializable. I presume that the correct way to ship
> an object to a serializer is by calling SerialObj::Serialize() with the
> appropriate serializer.

Correct.

> What are my options for picking them up at the
> receiving end?

You need a Serializer-derived class at the receiving end that
implements the Serializer::Got*() methods, and calls
Serializer::Unserialize() to actually do the work.
Serializer::Unserialize() gets it data from its member "io" which is
an instance of ChunkedIO and has to be initialized before (e.g. from
a fd by using a ChunkedIOFd).

Try taking a look at the implementation of FileSerializer to see an
example of how this work to read data back from a file.

> Oh and RemoteSerializer::ProcessSerialization() calls Unserialize()
> passing a SerialInfo, but Serializer::Unserialize() expects a bool -- is
> that intended?

Your eyes are quite good. :-)

No, that's indeed wrong (and actually I'm quite surprised that the
compiler implicitly converts the pointer to a bool here; but I guess
it does this at all places and not just inside an "if (...)" :-).

Unserialize(false) should be better.

> The reason why I'm looking at this is that I'm trying to find the right
> knobs to tweak to allow arbitrary *local* client applications to feed
> information into Bro (like a tuned sshd that can feed its events and
> traffic to a local Bro) without reinventing the wheel ...

If you'd like to pass in data from other applications than Bro, it
could perhaps make sense to think about a more well-defined data
format. The serializations are a representation of internal Bro
structures which could be quite hard to generate externally.

Robin

-- 
Robin Sommer * Room        01.08.055 * www.net.in.tum.de
TU Munich    * Phone (089) 289-18006 *  sommer at in.tum.de 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/bro/attachments/20040426/e5131c87/attachment.bin 


More information about the Bro mailing list