[Xorp-hackers] XRL call serialization

Thu Oct 29 11:02:07 PDT 2009

Hi Ben,

Just saw this... sorry for the delay.

Ben Greear wrote:
> With regard to XRL, I've a question:
>
> If an application makes 3 XRL calls:
>
> do_a()
> do_b()
> commit_all()
>
> Is there any guarantee that these are strictly delivered to
> the peer process in the order called?  Code appears to expect
> this to be true, but I'm suspicious that perhaps it does not.

XORP processes are intended to be asynchronous; this is realized using 
explicit coroutines, with no additional C++ runtime support other than 
UNIX system calls.

XRL is intended to be an asynchronous IPC layer. The calls in your 
example won't be guaranteed to be serialized, unless you explicitly 
serialize them in your process.

You can see this in XORP processes and tools in the form of a ping-pong 
between callback routines.
    An example of forcing serialization can be found in 
contrib/olsr/tools/print_databases.cc, where you can see the EventLoop 
being run whilst the Getter does its thing. get() is called, this fires 
off an XRL, and the list_cb() will successively be called for each 
fetch, until Getter::_done is set to true by the final fetch.

    BTW: One example of what NOT to do would be the 
XrlIO::register_rib() function in contrib/olsr/xrl_io.cc. The semantics 
behind those two XRL calls are co-dependent, and will be different 
depending on whether or not an OLSR origin table is already registered 
with the RIB. But you can see that two different XRLs can be fired off 
'in parallel'.

Class Xrl has no notion of call/reply sequence numbers, which are 
necessary in order to deal with out-of-order delivery, as well as 
identifying individual method calls on-the-wire.

    However, the XORP application code is written with the expectation 
that XRL is async. The fact that a few things 'under the hood' in XRL 
prevent it being fully async, is largely academic -- the tutorial 
materials are pretty clear you shouldn't assume serial method call 
returns, etc.

    In practice, what happens is that the XRL transport(s) themselves 
will stamp each call with a sequence ID. You can see this happening in 
XrlPFSTCPSender::send(). Although it *does* expect delivery in sequence 
(you can see this in XrlPFSTCPSender::read_event()), this is purely how 
it's been done here.

    In this respect, XRL is totally tied to TCP semantics in its 
implemention, and RPCs should not be reordered, given that their 
dispatch in the XRL target is synchronous with their delivery -- there 
is no intermediate queueing, apart from the kernel's socket buffers.

    But there should be no expectation of this by application 
developers. Indeed, if you look at the stubs which Thrift generates, the 
client code only allows 1 request in-flight; it always sets the sequence 
number to 0. In practice, this isn't a problem, because in Thrift, the 
servers tell clients apart per session.

    In XRL, we tell calls apart by method name. Something tells me this 
gets really interesting if we try to thread the RIB or otherwise move it 
into another process. I should point out that XRL targets never actually 
get to see the Xrl itself -- they just get passed a bunch of arguments 
by the XrlRouter, and their handler function invoked.

On a Grim Code Reaper's note:
    This makes it pretty much impossible, using the existing code, to 
implement any serialization or parallelism policy within each XORP 
process, as well as making it impossible to decentralize the method call 
disposition, because it's tied to TCP streams.

Therefore:
    Synchronous dispatch of method calls doesn't change in a Thrifted 
XORP to begin with -- too much of the existing router code is written 
around this expectation.

cheers,
BMS