[Xorp-hackers] Crash due to stale cached xrl sender pointer.

Ben Greear greearb at candelatech.com
Mon Nov 9 11:27:53 PST 2009


On 11/09/2009 10:57 AM, Bruce Simpson wrote:
> Ben Greear wrote:
>>
>> With regard to larger issues of the IPC, I dislike the current callback
>> logic. I would like to get rid of all the auto-generated template code
>> and use real callback objects. Each xrl call will have a pointer to a
>> callback
>> object and will call that at appropriate times.
>>
>> If/when you get your Thrift stuff in, I might attempt to work on
>> the callback logic.
>
> Can you expand on what a 'real callback object' is?
> The callback() method in XORP is a template method which returns a real
> C++ object.

The template code is impossible to understand.  I'd have a base 'XorpCallBack'
class and have others inherit from that.  In implementing clients/servers,
I'd probably have a few methods like:  foo::handleCallback(CallBackObject& obj)
that would handle lots of different callbacks in one place.

Maybe with a real object of obvious type, we could actually trace
code flow.  As it sits, the opaque templated callbacks make it
impossible for me to really understand a backtrace, for instance.

Anyway, that's just a rant.

A more critical problem is things like task_done() in task.cc (rtr-mgr).

It always assumes that the thing that completed was the front of _tasklist,
but it is not always so (that is one cause of the crash I worked around
with the ref-ptr hack you dislike so much).  That task_done() should
receive a task object so that it knows exactly what to remove from the
_task_list.

In fact, if tasks can be completed asynchronously, then that assumption in
task_list is completely broken.

If things cannot be completed async, then there is no need for all
the callback indirection anyway, and we could greatly simplify logic
flow by removing the callback logic.

> FWIW, Boost.Function is actually implemented in a pretty similar way to
> XORP's callback library:
> http://www.boost.org/doc/libs/1_40_0/doc/html/function.html

That's not helping me feel good about Boost :P

>> I don't think we should worry about any async message support in the
>> transport
>> level. As long as we have useful callbacks, then only the
>> client/servers need
>> to worry about async. For instance, client says 'do-big-work()'.
>> Server immediately responds 'working' (RPC call is now done).
>> Later, server can send more messages to client as the big work completes.
>>
>> XRL doesn't need to know anything about this.
>
> We already do. Asynchrony is just a fact of life in libxipc.
>
> To dispel any confusion: I'm not referring to asynchrony at the level of
> a socket or a UNIX file descriptor. When discussing asynchrony in my
> posts, I am writing about mechanisms already present in the code.
>
> It's been clear from past history, and private exchanges with other
> developers, that scalability has been a concern, particularly with BGP.
> Scalability is something I consider on the worktop now, and to be
> considered, but not necessarily taking action on it right now.

Someone that uses BGP should make a test case, including whatever tools are needed to
fake out a large BGP thing, and we should figure out exactly
where the problem lies.  Without a way to test this, we will have no
idea if the Thrift/Boost thing (or anything else) is better or not.

> In the case of XRL, pack/unpack is probably not the bottleneck;
> allocations probably are.

It's very easy to run under oprofile and/or gprof (now that I posted
patches for gprof support).  You don't have to assume anything.

>> Most of the slowdowns I've seen are stupid things like sleeping for
>> a timeout instead of doing work, probably to work around some race
>> no one felt like debugging.
>
> Were these timeouts actually blocking execution of other pending tasks?
> If so, that's something which needs dealing with.

They were blocking the task I cared about (xorpsh 'commit').  Anyway,
I fixed this already.

I also fixed the other 2-second sleeps on process startup (both patches were
posted some time back).


Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the Xorp-hackers mailing list