[Xorp-hackers] XRL performance: UNIX domain sockets vs TCP sockets

Mon Nov 30 16:25:42 PST 2009

Hi Ben,

If you've been following my commits over the past few days, you'll have 
noticed most of what I've been checking in, has been with a view to 
improving the situation where we're shipping the code as a production 
package, for deployment e.g. in a Linux distribution.

Some of the issues folk raised on xorp-users@ over the past few months 
have been dealt with, i.e. the naming collision between XORP's 
libraries, and libraries belonging to other packages. This has now been 
dealt with.

When I first pulled the SCons change in to SVN trunk, this was largely 
derived from JT's work in corporate SVN to reduce the overall runtime 
size using shared libraries.

We've done a lot of work on reducing the size of this system, which 
happens to be implemented in C++. STL pretty much warrants using -O1 at 
a minimum with gcc, to give its RTL tree optimizer a shot at eliding 
unused STL methods. [1]

The directory layout we've been using in the public SVN branch, has been 
incidental to how XORP was traditionally run for testing purposes. The 
layout the commercial product is using (on 'scons install'), is closer 
to what e.g. an RPM, DEB or other packaging system prefers.

Up until now, I've been trying to preserve this out of a desire not to 
violate the Principle of Least Astonishment (POLA) for folk who may have 
been working with the code for some time.

I haven't seen much traffic from such folk, so now is the time to make 
changes.

Ben Greear wrote:
>>
>> UNIX domain sockets can be used in XRL as it stands, by passing 'env
>> XORP_PF=x', without any patches.
>
> If these are faster, why not use them by default?

The XORP_PF environment variable just tells one wrapper class, 
XrlStdRouter, to prefer the use of one transport over another. It 
doesn't currently affect the Finder, or services which may already be 
running.

For 1.7, it might make sense to use UNIX domain stream sockets by 
default, and put some EnumVariable() glue into SConstruct to set the 
default at compile time. I've done this for 'optimize', 'debug', and 
'profile', as you probably already saw. Let's call this one 'transport'.

The main reason why the system doesn't default to using this, is because 
it removes the ability to split the router components over a set of 
nodes, named by IP address. This is something which was envisaged from 
the outset as a necessary feature for XORP as a network research tool.

I'd argue that this isn't a requirement now, for these reasons:
 * XORP is getting small enough to embed in a single embedded system 
image now, out of the box.
    In this use case, there is no need to distribute the router components.

 * The research use case is very different from the production router 
use case.
    Production use is going to be limited to deploying XORP on one node.
    If the router itself needs to be distributed, for research use, it's 
reasonable for users to ask for this at compile time, and take the 
cpu/ram hit which this entails.
    The system, as currently implemented, is actually pretty oriented 
towards this end, and we make a number of compromises on performance to 
make this possible. It's certainly possible to make it faster, but it 
would take more development work.

 * Virtualization is now a commodity technology. Let's think of several 
alternatives, and split them up into two categories:
    VMs which virtualize nodes by IP:
     FreeBSD jail -- virtualizes the userland only, shares network stack 
with other instances.
     User Mode Linux -- uses tun/tap, much like QEMU. Explicit addresses 
needed at each end of the tun/tap.

    VMs which don't virtualize nodes by IP:
     Xen -- paravirtualizes a kernel specifically built for it; every 
instance a pseudo-VM.
     VMware -- every instance is a virtual hardware machine
     VirtualBox -- ditto
     FreeBSD vimage -- builds on FreeBSD jail. Virtualizes the userland, 
*and* the kernel network stack, but runs in same kernel.

So in some cases, distributing router components by IP isn't even 
necessary or desirable; it all depends on what the interconnect is.

The fact that knowledge of the endpoints is needed, to distribute the 
router components, is in itself problematic. You end up building tools 
to wrap the invocation of each component.

Currently, VINI has to do this in order to drive XORP in network 
simulation. [2]

Distributing the system is probably better achieved using a mechanism 
designed for that purpose, i.e. AMQP.

Only the transport library, libxorp_ipc [3], then needs to know about 
how endpoint addresses, for system components, are actually allocated 
and managed. Providing each component knows where to find the rendezvous 
point (the Finder), the rest can be automated. How the interconnect is 
implemented is more or less hidden, inside the transport library.

It's worth bearing in mind that AMQP, as a tool and methodology, wasn't 
realized outside of minds in investment banking for many years, and only 
now, is it being promoted as a model for building distributed systems 
without undue implementation pain.

>
> This might fix the security problems of having some xorpsh connect
> from an outside box too...

In a default production build, I'm all for that.

Actually, whilst it might insulate xorpsh and xorp_rtrmgr somewhat, it 
still doesn't deal with the case of the Finder protocol, which is 
hard-wired to using a human-readable ASCII text protocol over a known 
TCP port, 19999 (this port can be overridden, and the address used by 
the bind() call can be overridden also).

Recall that I had to fix a potential remote DoS in there, due to failure 
to sanity check input from the network.

The argument there for using a textual protocol, is that it is then 
easier to debug XRL method calls. I'd argue that this isn't necessarily 
the case.

If you take Thrift for example, there are TProtocol mix-ins that can be 
used to get a human-readable trace of what any of the binary protocols 
are doing. In some cases, i.e. TJSONProtocol, the output is human 
readable anyway -- it's JavaScript Object Notation.

What's more of a problem, to my mind, is the greedy buffer use by 
BufferedAsyncReader. It is mostly used by the Finder protocol. The 
allocations used there do look like a 'hot path' in KCacheGrind.

cheers,
BMS

[1] Using -O as a minimum is accepted wisdom for other compilers, if 
using STL. Of course if you want to be sure, that only minimal 
optimization is performed on STL template instantiations, it's easy 
enough to take the options out of the 'straw man' case I posted. They 
change for each GCC release, though.
[2] http://www.vini-veritas.net/
[3] 'libxorp_ipc' is the new name for 'libxipc'. This is not Newspeak.