[Xorp-hackers] Runtime diagnostics and callbacks in backtraces

Fri Nov 20 07:35:00 PST 2009

Hi Ben,

To summarise: whilst DEBUG_CALLBACKS already takes appropriate steps to 
maintain the information we need in an efficient way, it unconditionally 
prints this information, which is not useful for production deployment.

I guess what we really need is a means of selectively printing and 
extracting this information, in situations where we may want to unwind 
the stack to discover where the callback was actually instantiated and 
invoked.

That said, it's a more general problem than XORP's, and I'd be very 
surprised if there weren't already projects out there which had dealt 
with this in a reasonable way.

I'd be interested to hear what others come up with.

I did start on a xorp_backtrace() patch this week, however, it is a 
total distraction from what I'm currently meant to be doing.

Ben Greear wrote:
>
> I'd like to know what triggered a chain of events leading up to
> an assert/crash/whatever.  It's probably a timer (which could
> store it's __LINE__ and __FILE__ (and/or other debug info)
> on creation, or it's the result of some request from outside (XRL).

This is a reasonable need.

Correlating what's happening in one process, with what's happening in 
another, would be non-trivial in the case where the processes are 
distributed.

I think the best we're going to get out of this is function names for 
now, though.

In the case of an XRL target, we really don't need to do any additional 
work in the target; the method names are named after the RPC calls 
involved already.

JT has strong feelings that the logging framework could do with being 
changed, to support uses like this.

This is really a job for something like libunwind. We're still going to 
depend on a valid runtime C/C++ heap for this. In an ideal world, we 
could take a snapshot of the call stack when callback() is instantiated. 
Then, if we encounter a problem during the callback dispatch, we can 
then unwind the stack, use DWARF debug information to identify if we hit 
the dispatch() method of a callback class, and look for the stack frame 
which instantiated the callback object itself.

Re the backtrace patch this week: The holdup there was that libexecinfo 
was producing differently formatted output from GLIBC's implementation 
of backtrace_symbols(), which makes it pretty useless in both cases 
unless I then parse the output of both equivalent routines. I emailed 
the maintainers/authors, but received no response as yet.

libunwind seems much more together, however, it doesn't compile nor work 
on the BSDs -- although it seems extremely portable between architectures.

[DEBUG_CALLBACKS]
>
> If you can't turn it on for production, then it's worthless for getting
> bug reports out of the field.  If we're only storing strings in objects,
> and only writing them out to logs when a problem actually occurs, it 
> shouldn't
> be _too_ much overhead.

We _definitely_ want to avoid any allocations on this path, this rules 
out placing std::string in the callback object. DEBUG_CALLBACKS already 
does this.

When DEBUG_CALLBACKS is defined, callback() becomes a macro, which is 
what we need in order to see the correct information at the point where 
the callback is instantiated.

I did a few quick experiments to identify what the cost of this is 
likely to be. I assumed that the runtime cost is likely to be 
sizeof(const char*) * 2 plus sizeof(int32_t) at a minimum.

So, GCC seems to generate string literals in the .rodata segment for 
__FILE__ and __func__. The __LINE__ seems to evaluate to an int32_t, 
*NOT* size_t.
Both __FILE__ and __LINE__ are handled entirely by the preprocessor; 
implementing __func__ needs help from the compiler.

__func__ is special, because it gets its own symbol name in .rodata, 
consisting of the C++ ABI mangled name with a __func__ suffix. When GCC 
sees these constructs, it substitutes the const char[] itself by value. 
This normally evaluates to a const char*, but I should point out it's by 
value -- sizeof(__func__) returns the length of the const char[].

__FILE__ and __LINE__ on their own are probably not going to lend 
themselves well to automated traceback, though.
However, C99 also adds __func__ support to the preprocessor. G++ was 
fine with this in -ansi, -pedantic, -std=c++98, -std=c++0x modes.

One of the quirks of C99 __func__ is that it only returns the name of 
the function or method, NOT the class containing it.

__PRETTY_FUNCTION__ is probably what we want, as it preserves the C++ 
ABI name. However, it demangles the name for us when producing the 
string literal in .rodata, and it's wholly GCC specific. The symbols are 
produced in a similar way to __func__, the suffix changes to 
__PRETTY_FUNCTION__.

That said, it's probably 'good enough' for tracing where a callback was 
sourced, without tedious manual cross-referencing with filenames and 
line numbers in source navigation tools, and without getting into the 
technical specifics of preserving the stack frame where the callback() 
was instantiated.

That's about all I've got time for on the subject of callbacks and 
backtraces, for the moment.

cheers,
BMS
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: foo.cc
Url: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20091120/f7def5b7/attachment.ksh