[Xorp-hackers] Crash due to stale cached xrl sender pointer.

Ben Greear greearb at candelatech.com
Mon Nov 9 11:39:08 PST 2009


On 11/09/2009 11:00 AM, Bruce Simpson wrote:
> So... back to the original problem.
>
> I'm glad non-reentrancy can be ruled out as a root cause, but that is
> always something that's good to tackle.
>
> It is possible that there's a race when the system is loaded, but the
> bug is all too obvious -- we are using a pointer as a cache, and there
> is no way to invalidate it.
>
> If there has been a race, I would really like to catch it.
>
> My best guess is that the send ran before one of these:
> 1. a FinderClient operation, which would invalidate the resolved sender
> transport, runs before the Xrl send.
> This is most likely.
> 2. a transport layer close notification inside an XrlPFSender.
> But this is unlikely. XrlPFSTCP uses BufferedAsyncFileReader. Only the
> Win32 transport code in that file will dispatch an async close, because
> that's what Winsock does.
>
> ...where XRL is concerned, these are the places I'd look more carefully
> for races.
>
> Sorry to be a nazi about ref_ptr<T>. It's one of those things I wish
> would just go away. There is a need for a refcounted object/pointer
> type. ref_ptr<T> fills that gap now; it existed before Boost did.

If you have something that takes it's place and works, then just
make the conversion and commit the patch.  It just needs a logical
reference pointer...I don't care exactly how it's implemented.

I posted patches to enable the xrl perf tests, so it's easy to run
performance regression tests now.


> There are several places where its use is actually typedef'd away,
> making it harder to tell, at-a-glance, what's going on. One of these is
> in the FinderClient. I'm going to want to get rid of this; ref_ptr<T>
> caused no end of problems for me in OLSR.

It wouldn't hurt my feelings to remove every typedef in the code,
except perhaps for type-defs of function pointers (which are ugly
as sin no matter how you do them).

> I did say that Boost needs to be introduced carefully and incrementally;
> I'll be sticking to that. I took a look at how Boost implements the
> weak_ptr<T> / shared_ptr<T> split. It turns out they both reference an
> embedded sp_counted_base instance, to implement the refcounts. They
> maintain a separate count of holders and watchers. It's thread safe, but
> it'll use atomic ops if you don't want to use threads; and looks pretty
> efficient.

I'll look at boost when someone posts a patch to make it do something.
I'll try to keep an open mind until then :)


Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the Xorp-hackers mailing list