[Xorp-hackers] Selector overhead; SelectorList bugs
Bruce Simpson
bms at incunabulum.net
Mon Nov 30 18:51:35 PST 2009
Just thinking aloud here about the implications of the performance
profiling experiments I've been doing.
There's room for improvement here. If someone stepped up to rewriting
the Selector to use poll(), that would give us more flexibility with I/O
ordering, but remain portable to the UNIX platforms we support.
This is an ideal task for a student, as it's something which can be
adapted quickly from the existing code.
For the fact meat, read on.
Being practical about it -- there's often no real benefit in switching
muxer, it's usually a micro-optimization. But what studying the muxer's
behaviour might shed light on is, though, to what extent Selector and
SelectorList can slow things down on the hot path.
To derive a baseline for measuring the muxer behaviour, I'd suggest
grabbing Poller_bench from dkftpbench, and running it under oprofile or
pmcstat:
http://www.kegel.com/dkftpbench/
As you noticed recently, the potential for race conditions does exist in
libxipc. The fact that class Xrl, is caching XrlPFSender*, with no way
to invalidate its reference cleanly, is just plain wrong. Thanks for
finding this bug, and the condition with SelectorList.
Even if we fix the reference problem, we have no way to guarantee that
the Finder I/O, or transport layer close notification, which would cause
the XrlPFSender* to be invalidated in the Xrl, would be serviced before
the client tries to do I/O to it.
The problem is that the UNIX select() mux just has no notion of relative
priority between the I/O channels it muxes, other than the file
descriptor number. [1]
This problem doesn't exist with poll() as implemented in BSD; the order
of the 'struct pollfd' in the 'fds' argument to poll() is the order in
which the kernel will service fds for pending I/O notification. Linux's
implementation of poll() seems to do the same thing; Windows does it out
of the box [2] although with some limitations.
So a relative priority for I/O *can* be implemented, in a UNIX portable
way. This depends however on undocumented, implementation-defined
behaviour of poll().
Another consideration is allocations and copies on the hot path: both
select() and poll() assume what you pass to them is mutable. This means
allocating and making a copy. Allocations can be elided, if you lazy
allocate; call vector::reserve() as you take new fd's for monitoring. If
the allocator isn't dumb, round this up to log2(n), if that's what
malloc() likes. [3]
I was considering ASIO, in the not so distant future, as a portable C++
alternative to replace much of what has been invented in libxorp.
Remember, Boost didn't exist when the project started. ASIO, however,
has no notion of relative priority either. [4]
What was briefly interesting was that Niels Provos' libevent [5] does
try to implement an I/O priority scheme. I stared at poll.c for a bit to
read all the C. It looks to me as though its priority scheme only works
in the sense that it will dispatch your callbacks in the order you give
it, only if the events happen to come in in that order; there are no
kernel API guarantees, even in how it wraps poll. So, it falls down for
the same reason as ASIO does.
One of the things which ASIO's io_service class does is, to wrap the
platform specifics of I/O multiplexing under its facade. Out of the box,
it will use kqueue on the BSDs. Dan Kegel's C10K page [6] is pretty
clear that kqueue pretty much kills everything else stone dead on
performance in benchmarks. [7]
Jonathan Lemon points out in his original paper on kqueue [8], that you
can chain kqueues in a hierarchy. This gives some scope for managing
relative priority of I/O events according to classes. kqueue, as
implemented, will scan for events in the same order in which you passed
the kevent list to the kernel. But hey, kqueue's just for BSDs.
I couldn't find any priority scheme in epoll. It looks as though its
internal RB-tree is keyed by fd, game over; no relative I/O priority.
cheers,
BMS
P.S. Given that I've had a lot of activity on trunk, it's likely that
I'll re-branch the Thrift work, to pick up on a lot of what's gone in
there. The first milestone for that branch is to get the Finder working,
that can happen relatively quickly. The dirty work of changing XIF to
use Thrift in its code generator, is mostly done.
[1] http://www.opengroup.org/onlinepubs/007908775/xsh/select.html
[2] Win32's WaitForMultipleObjects() is explicitly documented to favour
first-handle-wins.
[3] Studying the interaction between libstdc++'s 'operator new()' and
libc's 'malloc()' is worthwhile here.
[4] There are examples out there which add ASIO handlers to a priority
queue; this is NOT THE SAME THING as prioritizing the I/O notification
at its source.
[5] http://www.monkey.org/~provos/libevent/
[6] http://www.kegel.com/c10k.html
[7] http://www.monkey.org/~provos/libevent/libevent-benchmark2.jpg
(outperforms Linux epoll)
[8] http://people.freebsd.org/~jlemon/papers/kqueue.pdf
More information about the Xorp-hackers
mailing list