[Xorp-hackers] Crash due to stale cached xrl sender pointer.

Ben Greear greearb at candelatech.com
Thu Oct 29 10:43:55 PDT 2009


On 10/29/2009 10:10 AM, Bruce Simpson wrote:
> Hi Ben,
>
> Not really meant to be spending time on this at the moment, but I
> shall... it is not too far off from what I'm actually doing, and I think
> we probably do need to go over this again ground a bit, given that I am
> effectively rewriting the affected code right now.
>
> Ben Greear wrote:
>>
>> In general, I dislike smart pointers, but in this case, they seem
>> tailor made for the problem.
>
> I would disagree that smart pointers are even necessarily the right
> answer to the issue you've found; in some cases, they can do more harm
> than good.
>
> I got the electronic equivalent of dirty looks, at first, when I had to
> work around a problem in template class Spt using ref_ptr<..>&. However,
> that was an existing, isolated use of ref_ptr within the tree. I'd
> prefer not to use ref_ptr for new code if at all possible.
>
> XrlPFSender is a stereotype of an object which should not be created and
> destroyed trivially. If something in libxipc is tripping over it, it is
> possibly a race condition, or updates not being propagated elsewhere.
>
> In this scenario, the Xrl blob contains a cached pointer to a transport
> channel (XrlPFSender) which has now potentially gone away. Given that
> Xrl instances are like confetti, it would be difficult to track them
> all, and I'm not sure a refcount is the most appropriate way to deal
> with that (see below).

The refcount just keeps the sender object from being destroyed until
all xrls referencing it are cleaned up.  The sender was probably destroyed
because it timed out (I was starting 100 virtual router processes...loads
the system very heavy).

Please note that the sender will be marked in-active, so the XRL will not actually
try to use it, but if the memory is gone, then it can't even check the foo->active()
flag w/out crashing.

It seems a pretty simple use-after-free bug, and the fix seems pretty
trivial to me.


> Caching the transport pointer (XrlPFSender) in the Xrl itself is just
> asking for trouble in situations like this, given that we have no means
> of telling the Xrl 'your resolved sender has gone away' -- it's buried
> in libfooxif.so's copy-on-write BSS segment.
>
> It's a non-trivial issue to fix. Using the ref_ptr seems deceptively
> simple -- we get a handle on the transport, and so the code doesn't blow
> up, but we probably don't fix the underlying issue (unless I'm missing
> something).

Assuming a new sender is created, the Xrl will notice the cached one
is inactive and search for a new one.  Seems like it all works out to me.

> It would be good to get a handle first on who introduced the secondary
> caching mechanism, and why. Most likely this was to avoid any STL
> container traversals when an XRL is actually being sent, but given that
> you've probably run into a race which blows this mechanism up, it needs
> revisiting. (Yes, this has fallen under the axe in the Thrift branch...!)

I think you are over-thinking this one!

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the Xorp-hackers mailing list