[Xorp-hackers] Runtime execution profiling

Bruce Simpson bms at incunabulum.net
Sat Nov 28 23:47:58 PST 2009


Bruce Simpson wrote:
> KCacheGrind can't use gprof's output, but it *can* use the Google 
> Performance Tools (pprof) CPU profiler output.
>   

I've run a few experiments with test_xrl_sender, and I got far more 
useful data out of callgrind straight away.

The Google CPU profiler relies on two things: the SIGPROF handler, and 
RDTSC. I am running on a multi-core system, and the TSC is not 
guaranteed to be in sync across all cores; so I am likely to get skewed 
results if running in such an environment.
To be fair, the Google profiler is probably more useful for measuring 
algorithm performance in compute-intensive, i.e. scientific programs, 
and its approach fits this better.

The other thing about using cachegrind is: no need to cook the output. 
Using pprof with kcachegrind requires you to run pprof to extract 
symbols and convert its traces, cachegrind already does this for you.

With instruction fetch only, valgrind is roughly an order of 8 slower 
than native (AMD Phenom X3 8750). With L2/L3 cache simulation, it's an 
order of 16 -- based on the throughput counts which test_xrl_receiver keeps.
Those themselves may be skewed, although SystemClock defaults to 
CLOCK_MONOTONIC, which on FreeBSD, is an uncooked cycle timer since boot.

As a small experiment, I switched the use of CLOCK_MONOTONIC for 
CLOCK_MONOTONIC_FAST, and measured a small, but statistically 
significant, increase in performance. The kernel context switch overhead 
to read the clock is still there.
phk's kernel timecounter code will, in some circumstances, spin 
slightly, to give an accurate time reading, if the generation number on 
the clock we're reading changes just as we're about to return to 
userland. Of course, this isn't where the real meat is; it's just a 
microbenchmark.

I am just getting to grips with FreeBSD's native hardware-based 
profiler, pmc, and I'll post more about that shortly -- I am getting 
some excellent samples out of it. It's similar to oprofile.

cheers,
BMS



More information about the Xorp-hackers mailing list