[Xorp-hackers] Runtime diagnostics and callbacks in backtraces
Bruce Simpson
bms at incunabulum.net
Tue Nov 17 08:20:03 PST 2009
Ben Greear wrote:
>
> The template code is impossible to understand. I'd have a base
> 'XorpCallBack'
> class and have others inherit from that. In implementing
> clients/servers,
> I'd probably have a few methods like:
> foo::handleCallback(CallBackObject& obj)
> that would handle lots of different callbacks in one place.
>
> Maybe with a real object of obvious type, we could actually trace
> code flow. As it sits, the opaque templated callbacks make it
> impossible for me to really understand a backtrace, for instance.
I've been doing some research on a related topic today, as it's
something which affects all users (including corporate).
We've had a few folk on xorp-users@ who have installed XORP on
production systems, and run into problems. In this situation, GDB often
isn't accessible, or crash dumps are difficult to retrieve.
One of the things KDE, another large C++ system, does is to try to
retrieve crash dumps when an application launched from the KDE Desktop
crashes. To do this, however, it requires GDB. More information here:
http://techbase.kde.org/Development/Tutorials/Debugging/How_to_create_useful_crash_reports
* Let's recap on callbacks: The XORP callback library, in libxorp, is a
set of function templates which bind up C++ function pointers (including
member function pointers) for deferred invocation. [The Boost.Function
library operates in a similar way, although the representation of a
callable object there is more decoupled from its implementation.]
I believe part of the problem is GCC's 'deep typedef substitution'
feature. After GNU C++ 3.x, the ABI changed considerably, and so did the
runtime. Code generation also changed.
* Let's recap that C++ templates have parameterized types: Most C++
toolchains, including GNU C++ since 3.x, will use the _canonical type
name_ when producing debugging information.
* Whilst this leads to complete and correct error messages for the
compiler, they aren't particularly easy to read, so tools (e.g. stlfilt,
gdb-stl-utils) abound to parse C++ compiler error output.
These aren't really useful to us here; our problem is that of producing
meaningful system diagnostics, when the router is deployed in the field.
One answer to this might be to add explicit backtrace support which we
can potentially ship in a production build.
Unfortunately, the typedef substitution issue can't really be dealt with
here: producing a backtrace with arguments requires a lot more debugging
information., and all we're likely to get out of backtraces is the call
stack, but not the contents of the stack frames.
So I've investigated GLIBC's backtrace(), libunwind, and libdwarf. [1]
backtrace() itself is part of GLIBC since 2.1, however, it doesn't
demangle C++ symbols. There is sample code out there to wrap backtrace()
and backtrace_symbols() with abi::__cxa_demangle() to produce meaningful
C++ backtraces.
GNU C++ will emit DWARF entries for typedef'd types at the point of
their use, based on some quick experiments with the 'dwarfdump' utility
on the 'call_xrl' binary.
Let's walk through:
* The XrlRecvCallback type gets its own DW_TAG_typedef tag in the DWARF
segments, and any functions which reference it, appear to do so as a
DW_TAG_formal_parameter pointing to the typedef, NOT the canonical type.
* However, the backtrace is going to contain a reference to the
canonical type, NOT its typedef, due to how template name mangling
works, which allows the linker to do its job with the code which
generated as the result of instantiating a template.
* If it were possible to introduce an alias to the mangled symbol for
the template expansion which *contained* the typedef, that would give us
a hint, but debugging tools are probably still going to have to take
their best guess using a heuristic.
* Because the templatized callback object's dispatch() member could be
called from conceivably anywhere, mapping the callback object's 'this'
pointer back to a typedef is probably still going to require manual
inspection, though.
This is a very callback-specific problem.
As you quite rightly pointed out originally, it's something which could
potentially be solved with a first-class object, , i.e. instead of this:
typedef XorpCallback2<const XrlCmdError, const XrlArgs&,
XrlArgs*>::RefPtr XrlRecvCallback;
try to do something like this:
class XrlRecvCallback : public XorpCallback2<const XrlCmdError,
const XrlArgs&, XrlArgs*>::RefPtr {}
... this may work, because RefPtr is also a typedef alias for the
ref_ptr<T> object, where T is XorpCallback2 in the above example. (in
template meta-programming, typedef is assignment when working with types.)
This would, however, prevent the linker from doing any coalescing of
function fragments for the instantiation of template XorpCallback2
above, which would lead to classic template bloat. [2] Also, the syntax
is still really ugly.
I guess what we'd love GNU C++ to do is to let us provide some sort of
hint for the type name of a callable object, to make things more
human-readable.
This is a bit like what we want, but XrlRecvCallback is not just a
specialization, but a typedef alias of a *member* of a specialization:
http://wiki.dwarfstd.org/index.php?title=C%2B%2B0x:_Template_Aliases
Of course, the name XrlRecvCallback is deceptive, because it's actually
a ref_ptr<T> to the callback itself. As you know, I'm pretty opposed to
obscuring the use of a refcounted object pointer, because of the pain
it's caused me during development.
Fully specialized templates are legal C++, but I don't know if they are
legal as template aliases in C++0x.
If we had something like this:
**template<> using XorpRecvCallback = **XorpCallback2<const XrlCmdError, const XrlArgs&, XrlArgs*>;
...and then substituted this for the original use of
****XorpRecvCallback****:
shared_ptr<****XorpRecvCallback****>
...that might work for me.
cheers,
BMS
[1] P.S. I reckon a backtrace dumper for production builds could be
knocked up really quickly.
[2] In practice this might not be as big an issue as one might think,
because the linker may end up having to put multiple weak symbols for
template instantiations into each dynamic object, where we're using
shared libraries. See the recent auto_ptr<T> patch I posted for the XRL
client stubs, which has a similar problem.
More information about the Xorp-hackers
mailing list