[Xorp-hackers] Fix for PIM task list hang

Bruce Simpson bms at incunabulum.net
Tue Sep 1 06:43:23 PDT 2009


Ben,

Thanks for this change. As of today, I've applied a very small portion 
of it, by introducing debug_msg() calls into the path(s) where you've 
added XLOG warnings.

Ben Greear wrote:
> On some error conditions related to interface removal, the PIM 
> callbacks would
> not handle the next task, and so nothing would ever look at the task 
> queue
> again, effectively hanging the multicast routing daemon.

I think we need to look very carefully at changes which affect the flow 
of RPC calls in and out of PIM, as we are gearing up for significant 
refactoring in that area.

Were you able to pin the task list hang down to a specific PIM RPC call 
or set of events? It could be argued that failure of the Finder, still 
shouldn't be regarded as a purely transient failure.

This is especially the case, if we're in a situation where we're using 
in-flight shared memory and user-space synchronization mechanisms (e.g. 
futex, umtx) to control access to that shared memory, so I'd err on the 
side of the conservative, and not commit this change in full for now.

thanks,
BMS



More information about the Xorp-hackers mailing list