[Xorp-hackers] XRL call serialization
Bruce Simpson
bms at incunabulum.net
Thu Oct 29 12:09:16 PDT 2009
Hi Ben,
Time for more devil's advocate action.
Ben Greear wrote:
>
> It seems that the router-mgr *might* could read and queue several xrl
> requests, and then possibly answer them out of order.
Based on my recent footprints in XrlAction, that is very likely. But not
because XRL is doing anything wrong.
One of the things I wanted to mention in my previous reply on this
thread: if you keep calling different XRL methods, reentrancy in the
client isn't a problem -- you can tell your own requests apart just
fine, they're for different methods.
But but but: if we have multiple XRL calls in-flight, for the same
method, this breaks down. Now the dispatch of the callback ('here's the
answer to my question') will be on a per-call basis. And so the only
guarantee you get of in-order dispatch, is the fact that XRL transport
is using a stream (TCP out of the box).
If you mix possibly co-dependent operations and fire them off, problems
may happen. [Although the XRLs in these scenarios aren't being batched.]
This is why the Router Manager is pretty tight about its timings, and
keeping the XRL actions tied down to particular commit steps, is pretty
critical to making sure stuff doesn't go out of control.
Again, it might be worth revisiting Pavlin's original idea, that we
teach the routing processes to keep their own snapshots of state and
implement commit/rollback there. The more I stare at Thrift and XRL, the
more I believe that's a good idea. It simplifies the Router Manager
interface with the other processes.
Although as you point out, we still need to keep those snapshots
around in the Router Manager so that the process can restart OK --
either that or we give processes some abstract form of non-volatile
storage we can easily propagate back to the management point at the
point of commit.
However, you're quite right -- I see no reason why you can't introduce
funk into the system from the Router Manager, the same way that olsr's
register_rib() method might.
Consider this scenario -- let's imagine that xorp_olsr has crashed. It
left a whole bunch of OLSR routes in the RIB. It is using a non-default
admin distance. For whatever reason, this was configured on-the-fly, and
was an uncommitted change. That process is restarted.
Along comes the existing register_rib() function. Let's assume the
set-admin-distance step modifies the old origin table from the previous
incarnation of xorp_olsr. Let's also assume that there is a
redistribution policy in effect for OSPF, which is redistributing routes
above a given admin distance.on another interface to an OSPF backbone area.
You can see how that gets really interesting. As soon as the call to
change the admin distance has fired, the routes will be rewritten to
contain the new admin distance, the RIB will redistribute the routes
(via policy) to xorp_ospf, and we've got a fair amount of system
activity going on, just due to a process restart.
Fortunately, the RIB method to set the admin distance does not rewrite
existing routes at the moment, and that was deliberately left unfinished
(although not for this reason). So this scenario, whilst it's been
elaborated on somewhat, isn't possible just now with the mechanisms I've
described.
But it does point towards the need to either have a configurable policy
for method disposition, or strong guaranteees about the RPC layer
behaving in-order.
You end up having to rely on a reliable network transport. You can
assume that the XRL request you just got is to be executed right away,
but only insofar as the transport you read from, has not re-ordered
anything in transit.
Reliability doesn't imply in-order delivery to the user process. If you
receive XRL requests out of order, you'll need to buffer them. If your
transport isn't reliable, you have no way of knowing that you won't get
an earlier message -- without implementing the concept of a time-out;
i.e. if Mr Server don't see an out-of-order message within N time units,
I will time it out and send you a NACK, to stop blocking all other
access to the resource. [Sounds like kernel driver locking to me...]
Up until now, we have relied on TCP to do all of this for us behind the
scenes. The price we pay for that is some inefficiency in the
implementation: head-of-line blocking, and being unable to preserve RPC
method boundaries.
(This is why the AMQP guys have the hots for STCP, but the STCP guys
can't do much about pushing the model forward until Microsoft sit up and
take notice -- no-one's shipping STCP as a Windows 7 NDIS/TDI driver, as
far as I know.)
You can see why stuff like TIPC happens. But I seriously disagree with
their approach. Pushing all asynchrony into the kernel isn't the answer,
and it limits your client uptake -- Linux is not the only game in town,
and there are very good reasons for that which I won't go into here.
Just using the existing Berkeley Sockets API is cute, but far from
perfect -- it has holes of its own.
Also, they never really tackled the cross-language interop issue the way
Thrift has.
...
So I guess it boils down to: caveat implementor. If you use XRL, don't
rely on call serialization from the API. If you need to cross road after
pushing button, do so. Otherwise, you might end up in a traffic
accident. :-)
> (Been a few
> days since I poked at the router mgr code, not sure I fully understood
> it when I did).
There is a lot going on in there.
XRLs should be dispatched in the order in which they are received.
However, there are actually no guarantees for this behaviour -- it is
'best effort'.
When an XRL call is received, for example, STCPRequestHandler will
attempt to dispatch it immediately, in line with further reception.
XRL targets are internally synchronous. The method call dispatch
happens in the context of XrlRouter's event I/O callbacks, which are
registered with the outer EventLoop.
So from the server's point of view, XRL is pretty much synchronous. But,
even on the same host, that dispatch could happen on another CPU. [As
I've probably mentioned elsewhere, most of XORP's inter-process sync in
the time domain, is actually pinioned on the host's socket buffer locks.]
The uncertainty in the whole system to do with time and call dispatch is
however localized:
* When/how did that XRL get fired off?
* How are my socket buffers?
* How many cores do I have?
* How's my scheduling?
Just out of interest, I will reveal that as of this week, that I
have written most of the code generator needed to shim XRL calls
directly into Thrift ones.
This is so that adopting Thrift does not mean a dragnet across all
400+ KLOCs of XORP, but should make it a mostly drop-in replacement for
XRL in the existing code.
I have yet to write most of the new libxipc, though, which is why
I'm feeling that space out just now, and being pretty conservative in
what I'm disclosing (people have got a whiff of what I'm doing;
knowledge breeds expectations; expectations pump up the volume).
Thrift's C++ RPC libraries are actually pretty written. They make it
possible to pull off a few tricks for making the method calls a bit more
scalable, and for providing guarantees about call serialization in a
scalable system.
However, making that work requires some additional movement. As you
can see, there are a few assumptions about how the whole system actually
behaves, which are incorrect in some places.
cheers,
BMS
More information about the Xorp-hackers
mailing list