[Xorp-hackers] XRL performance: UNIX domain sockets vs TCP sockets
Bruce Simpson
bms at incunabulum.net
Mon Nov 30 16:25:42 PST 2009
Hi Ben,
If you've been following my commits over the past few days, you'll have
noticed most of what I've been checking in, has been with a view to
improving the situation where we're shipping the code as a production
package, for deployment e.g. in a Linux distribution.
Some of the issues folk raised on xorp-users@ over the past few months
have been dealt with, i.e. the naming collision between XORP's
libraries, and libraries belonging to other packages. This has now been
dealt with.
When I first pulled the SCons change in to SVN trunk, this was largely
derived from JT's work in corporate SVN to reduce the overall runtime
size using shared libraries.
We've done a lot of work on reducing the size of this system, which
happens to be implemented in C++. STL pretty much warrants using -O1 at
a minimum with gcc, to give its RTL tree optimizer a shot at eliding
unused STL methods. [1]
The directory layout we've been using in the public SVN branch, has been
incidental to how XORP was traditionally run for testing purposes. The
layout the commercial product is using (on 'scons install'), is closer
to what e.g. an RPM, DEB or other packaging system prefers.
Up until now, I've been trying to preserve this out of a desire not to
violate the Principle of Least Astonishment (POLA) for folk who may have
been working with the code for some time.
I haven't seen much traffic from such folk, so now is the time to make
changes.
Ben Greear wrote:
>>
>> UNIX domain sockets can be used in XRL as it stands, by passing 'env
>> XORP_PF=x', without any patches.
>
> If these are faster, why not use them by default?
The XORP_PF environment variable just tells one wrapper class,
XrlStdRouter, to prefer the use of one transport over another. It
doesn't currently affect the Finder, or services which may already be
running.
For 1.7, it might make sense to use UNIX domain stream sockets by
default, and put some EnumVariable() glue into SConstruct to set the
default at compile time. I've done this for 'optimize', 'debug', and
'profile', as you probably already saw. Let's call this one 'transport'.
The main reason why the system doesn't default to using this, is because
it removes the ability to split the router components over a set of
nodes, named by IP address. This is something which was envisaged from
the outset as a necessary feature for XORP as a network research tool.
I'd argue that this isn't a requirement now, for these reasons:
* XORP is getting small enough to embed in a single embedded system
image now, out of the box.
In this use case, there is no need to distribute the router components.
* The research use case is very different from the production router
use case.
Production use is going to be limited to deploying XORP on one node.
If the router itself needs to be distributed, for research use, it's
reasonable for users to ask for this at compile time, and take the
cpu/ram hit which this entails.
The system, as currently implemented, is actually pretty oriented
towards this end, and we make a number of compromises on performance to
make this possible. It's certainly possible to make it faster, but it
would take more development work.
* Virtualization is now a commodity technology. Let's think of several
alternatives, and split them up into two categories:
VMs which virtualize nodes by IP:
FreeBSD jail -- virtualizes the userland only, shares network stack
with other instances.
User Mode Linux -- uses tun/tap, much like QEMU. Explicit addresses
needed at each end of the tun/tap.
VMs which don't virtualize nodes by IP:
Xen -- paravirtualizes a kernel specifically built for it; every
instance a pseudo-VM.
VMware -- every instance is a virtual hardware machine
VirtualBox -- ditto
FreeBSD vimage -- builds on FreeBSD jail. Virtualizes the userland,
*and* the kernel network stack, but runs in same kernel.
So in some cases, distributing router components by IP isn't even
necessary or desirable; it all depends on what the interconnect is.
The fact that knowledge of the endpoints is needed, to distribute the
router components, is in itself problematic. You end up building tools
to wrap the invocation of each component.
Currently, VINI has to do this in order to drive XORP in network
simulation. [2]
Distributing the system is probably better achieved using a mechanism
designed for that purpose, i.e. AMQP.
Only the transport library, libxorp_ipc [3], then needs to know about
how endpoint addresses, for system components, are actually allocated
and managed. Providing each component knows where to find the rendezvous
point (the Finder), the rest can be automated. How the interconnect is
implemented is more or less hidden, inside the transport library.
It's worth bearing in mind that AMQP, as a tool and methodology, wasn't
realized outside of minds in investment banking for many years, and only
now, is it being promoted as a model for building distributed systems
without undue implementation pain.
>
> This might fix the security problems of having some xorpsh connect
> from an outside box too...
In a default production build, I'm all for that.
Actually, whilst it might insulate xorpsh and xorp_rtrmgr somewhat, it
still doesn't deal with the case of the Finder protocol, which is
hard-wired to using a human-readable ASCII text protocol over a known
TCP port, 19999 (this port can be overridden, and the address used by
the bind() call can be overridden also).
Recall that I had to fix a potential remote DoS in there, due to failure
to sanity check input from the network.
The argument there for using a textual protocol, is that it is then
easier to debug XRL method calls. I'd argue that this isn't necessarily
the case.
If you take Thrift for example, there are TProtocol mix-ins that can be
used to get a human-readable trace of what any of the binary protocols
are doing. In some cases, i.e. TJSONProtocol, the output is human
readable anyway -- it's JavaScript Object Notation.
What's more of a problem, to my mind, is the greedy buffer use by
BufferedAsyncReader. It is mostly used by the Finder protocol. The
allocations used there do look like a 'hot path' in KCacheGrind.
cheers,
BMS
[1] Using -O as a minimum is accepted wisdom for other compilers, if
using STL. Of course if you want to be sure, that only minimal
optimization is performed on STL template instantiations, it's easy
enough to take the options out of the 'straw man' case I posted. They
change for each GCC release, though.
[2] http://www.vini-veritas.net/
[3] 'libxorp_ipc' is the new name for 'libxipc'. This is not Newspeak.
More information about the Xorp-hackers
mailing list