Need help understanding serializer

Christian Kreibich christian at whoop.org
Mon Apr 26 12:27:13 PDT 2004


On Mon, 2004-04-26 at 06:46, Robin Sommer wrote:
>
> First of all, there are still a few loose ends which need some
> clean-up. I just haven't found time to do that yet. :-(

Okay sure -- this is neat stuff, and if I'm a pita then just tell me to
bugger off and I'll come back once you're happy with it :)

> Most importantly, some classes are serialized slightly different
> than others, because there are two types of interfaces, an old one
> and a new one (the new one correctly handles shallow-copied objects,
> which the old doesn't). For example, Val uses the old
> interface while Conn uses the new one. You can differentiate the two
> by looking for a DECLARE_SERIAL in the class definition; if it's
> there it's the new interface. If the semantincs are different, the
> explanations below refer to the new one. I am going to adapt the
> other classes asap

I see, thanks!

> (and I intend to write a short doc about this
> stuff).

That'd be cool. Actually I think a Hacker's Guide to Bro would be really
useful. The thing is really quite big now ...

> Unserialize()/Serialize() are (non-virtual) methods which are to be
> called when you want to actually serialize/unserialize an object,
> i.e. that's the "user-interface". They are only defined inside the
> base class of an hierarchy[1] (e.g. Conn has as Serialize(), but
> TCP_Connection doesn't).

Okay. I'm still a bit confused because Connections declare

  bool Serialize(Serializer* s) const;
  static Connection* Unserialize(Serializer* ser);

but SerialObj's also have those, but different signatures:

  bool Serialize(Serializer* s, SerialInfo* i, bool cache = true) const;
  static SerialObj* Unserialize(Serializer* s, SerialType type,
                                bool cache = true);

Is that related to the different APIs?

> Unserialize()/Serialize(), in turn, call the (virtual)
> DoSerialize()/DoUnserialize() which need to be implemented in every
> class derived from such a base class. DoSerialize()/DoUnserialize()
> are supposed to call their parent's implementation first, and then
> read/write their own attributes.

Yeah I saw that -- nice!

> The types of objects that a Serializer handles are different from
> the classes containing Serialize/Unserialize methods; e.g.
> Serializer serializes function calls for which there is no
> corresponding class.

<enlightenment> Aaaaaaah! </enlightenment>

> The characters indicate which kind of
> "top-level" serialization follows, while the SER_* constants specify
> a concrete class.
> 
> Perhaps this could be avoided somehow, but it would introduce more
> dependencies between the Serializer and Serialize/Unserialize
> methods. And the additional overhead is only small.

Okay sure. I just want to understand it! :)

> > Are these special in a way to have them implemented this way? Couldn't
> > there be a "received" callback per SER_xxx constant that resides as
> > a static method in the serializable classes themselves? So we can avoid
> > hardcoding anything?
> 
> Putting them into the serializable classes themselves doesn't work
> as it depends on the serializer what we need to do (e.g. the
> RemoteSerializer treats a received ID different than the
> PersistenceSerializer). 

Oh, I see. Thanks.

> It could be an alternative to use only one of the
> Serialize()/Unserialize()/Got() methods which would handle all
> cases. But I don't think that would be much nicer: first, each
> serializer would use some switch-construct anyway, and second, we
> would lose the static type checking.

True. And if I understand you correctly, I normally won't have to deal
with a serializer's internals anyway because I only need the
Serialize()/Unserialize() stuff at some point in the hierarchies you're
mentioning above. I think I'm starting to get it.

> > What are my options for picking them up at the
> > receiving end?
> 
> You need a Serializer-derived class at the receiving end that
> implements the Serializer::Got*() methods, and calls
> Serializer::Unserialize() to actually do the work.

Mhmm that still confuses me. You're saying above that the types of
objects that Serializers handle are different from the
Serialize()/Unserialize() hierarchies. And the Got* methods aren't for
arbitrary serializable objects but just for the specific types that
their names indicate, right?

So say I have a class Foo that implements DoSerialize() and
DoUnserialize() following the comments in SerialObj.h, and higher up
Foo's hierarchy is Bar that implements Unserialize() as you're
describing above. Now I ship a Foo instance using Bar::Serialize(s,
...). How do I get from the Serializer at the receiving end noticing
that something arrives, to the Bar::Unserialize() call at the far end?

> Serializer::Unserialize() gets it data from its member "io" which is
> an instance of ChunkedIO and has to be initialized before (e.g. from
> a fd by using a ChunkedIOFd).

Yep thanks for writing the chunk stuff, that's really useful.

> No, that's indeed wrong (and actually I'm quite surprised that the
> compiler implicitly converts the pointer to a bool here; but I guess

So was I!

> If you'd like to pass in data from other applications than Bro, it
> could perhaps make sense to think about a more well-defined data
> format. The serializations are a representation of internal Bro
> structures which could be quite hard to generate externally.

Well for now I was really just looking for a way to pump back and forth
simple structs and maybe an occasional dynamic length byte string. What
I had in mind is roughly this:

- Bro subsytem X can handle input from client application X.
- Some LocalSerializer handles local comms through domain sockets
- Client app X registers with Bro at startup
- Client app X sends some data 
- Bro side recognizes that data for subsystem X are coming up and
notifies subsystem X
- Subsystem X extracts next data item from the link and processes it

And vice versa. I'd leave it entirely up to the subsystems how to define
the data layout. As long as I could use the various Read/Write methods
of the Serializer I'd be happy .. The main hurdle I didn't manage in the
current code was how to get from the Serializer noticing that data
arrives to an arbitrary subsytem. 

I even have a name for the client library already: libbroccoli. BRO
Client COmmunications LIbrary ;)

Thanks Robin!

Best,
Christian.
-- 
________________________________________________________________________
                                          http://www.cl.cam.ac.uk/~cpk25
                                                    http://www.whoop.org





More information about the Bro mailing list