[Xorp-hackers] rtrmgr and TaskManager

Bruce Simpson bms at incunabulum.net
Tue Oct 6 09:54:12 PDT 2009


Ben Greear wrote:
>
> Anything that depends on waiting for other tasks to run by just sleeping
> for a while is a broken algorithm, so I'd prefer to see the problems 
> sooner
> than later.

That's not entirely true. Let me clarify.

In a threaded environment, this comment is valid; threads can starve 
each other of resources, or cause deadlock/livelock/experience race 
conditions, if synchronization is incorrect. So yes, your point about 
tasks sleeping to achieve synchronization being a flawed mechanism, is 
valid, in a threaded environment.

In a coroutine based environment, which is what XORP uses, this isn't a 
valid comment. Explicit yield points are necessary to allow other tasks 
to run, and 'synchronization' is achieved using state variables of some 
kind. This is the case in C++ as with any other language which 
implements coroutines -- there is only a single thread of execution, so 
in effect, nothing is ever sleeping.

The 'synchronization' point, if you like, is when select() finally gets 
called. This is pretty much what the io_service idiom in Boost C++ is doing.

Continuations offer language support for the coroutine construct, which 
is something C++ doesn't have; see here and further on in this reply:
    http://en.wikipedia.org/wiki/Coroutine

In the case of the Router Manager, I wouldn't be entirely surprised if 
there were callbacks stacked up waiting for dispatch in the background, 
however given how serial it is in nature (in terms of process bringup 
and trying to avoid thundering herd problems for OS resources), I'm not 
surprised it errs on the side of conservatism, hence the large timeout 
thresholds.

>   From my poking at the code, I can't see any reason it should
> need to sleep though...other tasks can run just fine after that one
> completes.  If there are others that *must* run first, hopefully they
> are properly chained with callbacks (the commit seems to be done thus).
> I'm going to run with zero timer there and see if any problems
> shake out.  After several hours yesterday, I had seen no problems, but 
> saw significant
> speed-up in 'commit' xorpsh commands which is very useful for me.

I buy the argument, but I'm sure you can understand my hands-off / 
kid-gloves position with regards to the Router Manager and taking 
changes for it -- it is a large C++ subsystem which I'm not entirely 
familiar with, and when I've made changes to it in the past, mostly when 
porting to Win32, it's been a case of get in, get out, stay focused, get 
it over with, and survive it.

If you experiment with turning those timeouts down, and it works for 
you, that's great, but I really need to have a clear picture of what's 
going on, if I'm to be expected to support it on an ongoing basis.

>
> With regard to re-architecting rtr-mgr:  Networking is asynchronous by 
> design
> and considering that external events (interfaces coming & going, link 
> state
> bouncing, etc) can happen at any time, the code just needs to deal 
> properly
> with async events.

In XORP's case, more engineering time seems to have been burnt up on 
getting the XRL layer written than on these external events you mention. 
The FEA in theory handles all of these events, it is something of a 
kitchen sink. What could do with better realization is how these events 
are propagated to the rest of the system -- which is why I've been 
focused on looking at XRL.

>   The one thing I'd work towards is more of a 'desired'
> v/s 'actual' config.  Users could always configure any logical 
> configuration
> and the system will try to make this happen, but it will also deal 
> properly
> with 'phantom' things like interfaces that don't exist currently.  A 
> different
> programming language isn't going to help any of that I think..and I'd 
> very
> much like to keep with c/c++.

As you've probably already seen, the Router Manager code is non-trivial, 
and there's a lot of complexity in there to deal with the asynchrony of 
the XRL RPC calls.

I agree that the configuration model needs serious looking at for things 
like dynamic interfaces (VPN, wireless, hot-swappable cards etc) and 
it's something which I raised several times as an agenda point during my 
time at ICSI. Unfortunately, the development focus has been in other 
areas, and I haven't been in a position to call the shots on where the 
effort went. I certainly got the impression that this put some folk off 
from trying XORP in the here and now.

Regarding the use of C/C++ for development: XORP is strongly tied to the 
concept of continuations, even if it doesn't have language support.

Twisted Python at least has the benefit of strong language support for 
continuations, in the form of how it overloads the 'yield' operator. 
This allows a call stack frame to be easily tucked away and restored at 
a later point in time, and in an exception safe way.

There have been efforts over the years to try to do this in C++, e.g. 
uC++, Concurrent C++ and others, but none of them have matured 
sufficiently for production use.

What we have in XORP is a compromise, and it's largely tied to the 
semantics of how I/O happens in a UNIX-like system.

cheers
BMS



More information about the Xorp-hackers mailing list