[Xorp-hackers] Selector overhead; SelectorList bugs

Bruce Simpson bms at incunabulum.net
Mon Nov 30 18:51:35 PST 2009


Just thinking aloud here about the implications of the performance 
profiling experiments I've been doing.

There's room for improvement here. If someone stepped up to rewriting 
the Selector to use poll(), that would give us more flexibility with I/O 
ordering, but remain portable to the UNIX platforms we support.

This is an ideal task for a student, as it's something which can be 
adapted quickly from the existing code.

For the fact meat, read on.

Being practical about it  -- there's often no real benefit in switching 
muxer, it's usually a micro-optimization. But what studying the muxer's 
behaviour might shed light on is, though, to what extent Selector and 
SelectorList can slow things down on the hot path.

To derive a baseline for measuring the muxer behaviour, I'd suggest 
grabbing Poller_bench from dkftpbench, and running it under oprofile or 
pmcstat:
    http://www.kegel.com/dkftpbench/

As you noticed recently, the potential for race conditions does exist in 
libxipc. The fact that class Xrl, is caching XrlPFSender*, with no way 
to invalidate its reference cleanly, is just plain wrong. Thanks for 
finding this bug, and the condition with SelectorList.

Even if we fix the reference problem, we have no way to guarantee that 
the Finder I/O, or transport layer close notification, which would cause 
the XrlPFSender* to be invalidated in the Xrl, would be serviced before 
the client tries to do I/O to it.
The problem is that the UNIX select() mux just has no notion of relative 
priority between the I/O channels it muxes, other than the file 
descriptor number. [1]

This problem doesn't exist with poll() as implemented in BSD; the order 
of the 'struct pollfd' in the 'fds' argument to poll() is the order in 
which the kernel will service fds for pending I/O notification. Linux's 
implementation of poll() seems to do the same thing; Windows does it out 
of the box [2] although with some limitations.

So a relative priority for I/O *can* be implemented, in a UNIX portable 
way. This depends however on undocumented, implementation-defined 
behaviour of poll().

Another consideration is allocations and copies on the hot path: both 
select() and poll() assume what you pass to them is mutable. This means 
allocating and making a copy. Allocations can be elided, if you lazy 
allocate; call vector::reserve() as you take new fd's for monitoring. If 
the allocator isn't dumb, round this up to log2(n), if that's what 
malloc() likes. [3]

I was considering ASIO, in the not so distant future, as a portable C++ 
alternative to replace much of what has been invented in libxorp. 
Remember, Boost didn't exist when the project started. ASIO, however, 
has no notion of relative priority either. [4]

What was briefly interesting was that Niels Provos' libevent [5] does 
try to implement an I/O priority scheme. I stared at poll.c for a bit to 
read all the C. It looks to me as though its priority scheme only works 
in the sense that it will dispatch your callbacks in the order you give 
it, only if the events happen to come in in that order; there are no 
kernel API guarantees, even in how it wraps poll. So, it falls down for 
the same reason as ASIO does.

One of the things which ASIO's io_service class does is, to wrap the 
platform specifics of I/O multiplexing under its facade. Out of the box, 
it will use kqueue on the BSDs. Dan Kegel's C10K page [6] is pretty 
clear that kqueue pretty much kills everything else stone dead on 
performance in benchmarks. [7]

Jonathan Lemon points out in his original paper on kqueue [8], that you 
can chain kqueues in a hierarchy. This gives some scope for managing 
relative priority of I/O events according to classes. kqueue, as 
implemented, will scan for events in the same order in which you passed 
the kevent list to the kernel. But hey, kqueue's just for BSDs.

I couldn't find any priority scheme in epoll. It looks as though its 
internal RB-tree is keyed by fd, game over; no relative I/O priority.

cheers,
BMS

P.S. Given that I've had a lot of activity on trunk, it's likely that 
I'll re-branch the Thrift work, to pick up on a lot of what's gone in 
there. The first milestone for that branch is to get the Finder working, 
that can happen relatively quickly. The dirty work of changing XIF to 
use Thrift in its code generator, is mostly done.

[1] http://www.opengroup.org/onlinepubs/007908775/xsh/select.html
[2] Win32's WaitForMultipleObjects() is explicitly documented to favour 
first-handle-wins.
[3] Studying the interaction between libstdc++'s 'operator new()' and 
libc's 'malloc()' is worthwhile here.
[4] There are examples out there which add ASIO handlers to a priority 
queue; this is NOT THE SAME THING as prioritizing the I/O notification 
at its source.
[5] http://www.monkey.org/~provos/libevent/
[6] http://www.kegel.com/c10k.html
[7] http://www.monkey.org/~provos/libevent/libevent-benchmark2.jpg 
(outperforms Linux epoll)
[8] http://people.freebsd.org/~jlemon/papers/kqueue.pdf



More information about the Xorp-hackers mailing list