[Xorp-hackers] Runtime diagnostics and callbacks in backtraces

Bruce Simpson bms at incunabulum.net
Tue Nov 17 08:20:03 PST 2009


Ben Greear wrote:
>
> The template code is impossible to understand.  I'd have a base 
> 'XorpCallBack'
> class and have others inherit from that.  In implementing 
> clients/servers,
> I'd probably have a few methods like:  
> foo::handleCallback(CallBackObject& obj)
> that would handle lots of different callbacks in one place.
>
> Maybe with a real object of obvious type, we could actually trace
> code flow.  As it sits, the opaque templated callbacks make it
> impossible for me to really understand a backtrace, for instance.

I've been doing some research on a related topic today, as it's 
something which affects all users (including corporate).

We've had a few folk on xorp-users@ who have installed XORP on 
production systems, and run into problems. In this situation,  GDB often 
isn't accessible, or crash dumps are difficult to retrieve.

One of the things KDE, another large C++ system, does is to try to 
retrieve crash dumps when an application launched from the KDE Desktop 
crashes. To do this,  however, it requires GDB. More information here: 
http://techbase.kde.org/Development/Tutorials/Debugging/How_to_create_useful_crash_reports

 * Let's recap on callbacks: The XORP callback library, in libxorp, is a 
set of function templates which bind up C++ function pointers (including 
member function pointers) for deferred invocation. [The Boost.Function 
library operates in a similar way, although the representation of a 
callable object there is more decoupled from its implementation.]

I believe part of the problem is GCC's 'deep typedef substitution' 
feature. After GNU C++ 3.x, the ABI changed considerably, and so did the 
runtime. Code generation also changed.

 * Let's recap that C++ templates have parameterized types: Most C++ 
toolchains, including GNU C++ since 3.x, will use the _canonical type 
name_ when producing debugging information.
 * Whilst this leads to complete and correct error messages for the 
compiler, they aren't particularly easy to read, so tools (e.g. stlfilt, 
gdb-stl-utils) abound to parse C++ compiler error output.

These aren't really useful to us here; our problem is that of producing 
meaningful system diagnostics, when the router is deployed in the field. 
One answer to this might be to add explicit backtrace support which we 
can potentially ship in a production build.

Unfortunately, the typedef substitution issue can't really be dealt with 
here: producing a backtrace with arguments requires a lot more debugging 
information., and all we're likely to get out of backtraces is the call 
stack, but not the contents of the stack frames.

So I've investigated GLIBC's backtrace(), libunwind, and libdwarf. [1]

backtrace() itself is part of GLIBC since 2.1, however, it doesn't 
demangle C++ symbols. There is sample code out there to wrap backtrace() 
and backtrace_symbols() with abi::__cxa_demangle() to produce meaningful 
C++ backtraces.

GNU C++ will emit DWARF entries for typedef'd types at the point of 
their use, based on some quick experiments with the 'dwarfdump' utility 
on the 'call_xrl' binary.

Let's walk through:
 * The XrlRecvCallback type gets its own DW_TAG_typedef tag in the DWARF 
segments, and any functions which reference it, appear to do so as a 
DW_TAG_formal_parameter pointing to the typedef, NOT the canonical type.

 * However, the backtrace is going to contain a reference to the 
canonical type, NOT its typedef, due to how template name mangling 
works, which allows the linker to do its job with the code which 
generated as the result of instantiating a template.

 * If it were possible to introduce an alias to the mangled symbol for 
the template expansion which *contained* the typedef, that would give us 
a hint, but debugging tools are probably still going to have to take 
their best guess using a heuristic.

* Because the templatized callback object's dispatch() member could be 
called from conceivably anywhere, mapping the callback object's 'this' 
pointer back to a typedef is probably still going to require manual 
inspection, though.

This is a very callback-specific problem.

As you quite rightly pointed out originally, it's something which could 
potentially be solved with a first-class object, , i.e. instead of this:
    typedef XorpCallback2<const XrlCmdError, const XrlArgs&, 
XrlArgs*>::RefPtr XrlRecvCallback;
try to do something like this:
    class XrlRecvCallback : public XorpCallback2<const XrlCmdError, 
const XrlArgs&, XrlArgs*>::RefPtr {}

... this may work, because RefPtr is also a typedef alias for the 
ref_ptr<T> object, where T is XorpCallback2 in the above example. (in 
template meta-programming, typedef is assignment when working with types.)

This would, however, prevent the linker from doing any coalescing of 
function fragments for the instantiation of template  XorpCallback2 
above, which would lead to classic template bloat. [2] Also, the syntax 
is still really ugly.

I guess what we'd love GNU C++ to do is to let us provide some sort of 
hint for the type name of a callable object, to make things more 
human-readable.

This is a bit like what we want, but XrlRecvCallback is not just a 
specialization, but a typedef alias of a *member* of a specialization: 
    http://wiki.dwarfstd.org/index.php?title=C%2B%2B0x:_Template_Aliases

Of course, the name XrlRecvCallback is deceptive, because it's actually 
a ref_ptr<T> to the callback itself. As you know, I'm pretty opposed to 
obscuring the use of a refcounted object pointer, because of the pain 
it's caused me during development.

Fully specialized templates are legal C++, but I don't know if they are 
legal as template aliases in C++0x.

If we had something like this:

**template<> using XorpRecvCallback = **XorpCallback2<const XrlCmdError, const XrlArgs&, XrlArgs*>;

...and then substituted this for the original use of 
****XorpRecvCallback****:
    shared_ptr<****XorpRecvCallback****>

...that might work for me.

cheers,
BMS

[1] P.S. I reckon a backtrace dumper for production builds could be 
knocked up really quickly.
[2] In practice this might not be as big an issue as one might think, 
because the linker may end up having to put multiple weak symbols for 
template instantiations into each dynamic object, where we're using 
shared libraries. See the recent auto_ptr<T> patch I posted for the XRL 
client stubs, which has a similar problem.



More information about the Xorp-hackers mailing list