[Xorp-hackers] Problems migrating a routing protocol...

Mark Handley M.Handley@cs.ucl.ac.uk
Thu, 16 Dec 2004 08:57:44 +0000


>Some seconds after this, I get the following report in the router manager:
>
>[ 2004/12/15 16:47:46  ERROR xorp_rtrmgr:2116 FINDER +85
>finder_xrl_queue.hh dispatch_cb ] Sent xrl got response 211 Reply timed out
>[ 2004/12/15 16:47:46  ERROR xorp_rtrmgr:2116 FINDER +85
>finder_xrl_queue.hh dispatch_cb ] Sent xrl got response 211 Reply timed out
>[ 2004/12/15 16:47:46 INFO xorp_rib RIB ] Received death event for
>protocol test shutting down -------
>OriginTable: test
>IGP
>next table = Redist:test
>
>It seems to me that the RIB tries to call the callback function I have
>sent in the send_add_interface_route4 call, but for some reason it
>times-out. And finally, the router manager thinks that my routing
>protocol died, but in fact it keeps running as nothing has happened.
>Does anybody have any idea of what could be happening? Why the RIB can
>not call the callback function?

The most likely problem is that the eventloop isn't getting control
frequently enough.  

If your process is single-threaded, then the eventloop needs to be at
the heart of the process.  Normally the main execution loop should
look something like:

    while (!_done) {
        _eventloop.run();
    }

Then everything else is a call back from a registered timer or a event
handler.  You then need to make sure that all event handlers do
return, and that they all return withing a second or so.  If you need
an event handler to take longer than that, then there are techniques
we use, but I won't go in to detail on those here unless you think
that might be the problem.

But it sounds like your code actually has a separate thread for the
XORP IPC handling.  So long as that thread is running, and has a main
execution loop something like the one above, then things *should* work
right.

The error:

>[ 2004/12/15 16:47:46 INFO xorp_rib RIB ] Received death event for
>protocol test shutting down -------

indicates that the XORP finder has decided your process is dead, and
has then told the run.  The XRL library code sends keepalives to the
finder, so probably the finder didn't get a keepalive. If your process
is still alive and active, then this is almost always because the
eventloop didn't get executed for more than 30 seconds or so, or the
XrlRouter you created to maintain the link with the finder has been
deleted.

Cheers,
	Mark