[Xorp-hackers] Fix for PIM task list hang
Bruce Simpson
bms at incunabulum.net
Tue Sep 1 06:43:23 PDT 2009
Ben,
Thanks for this change. As of today, I've applied a very small portion
of it, by introducing debug_msg() calls into the path(s) where you've
added XLOG warnings.
Ben Greear wrote:
> On some error conditions related to interface removal, the PIM
> callbacks would
> not handle the next task, and so nothing would ever look at the task
> queue
> again, effectively hanging the multicast routing daemon.
I think we need to look very carefully at changes which affect the flow
of RPC calls in and out of PIM, as we are gearing up for significant
refactoring in that area.
Were you able to pin the task list hang down to a specific PIM RPC call
or set of events? It could be argued that failure of the Finder, still
shouldn't be regarded as a purely transient failure.
This is especially the case, if we're in a situation where we're using
in-flight shared memory and user-space synchronization mechanisms (e.g.
futex, umtx) to control access to that shared memory, so I'd err on the
side of the conservative, and not commit this change in full for now.
thanks,
BMS
More information about the Xorp-hackers
mailing list