[Xorp-hackers] Taking benefit of parallel processing in BGP

Wed Dec 16 03:46:31 PST 2009

Simon van der Linden wrote:
> I'm a computer engineering student from the UCLouvain (Belgium) and in 
> the context of my master's thesis (a full-time 4-month project), I look 
> at ways to improve the performance of routers' control planes by using 
> multicore architectures.
>   

Welcome!

XORP is basically designed around a select() based reactor which invokes 
deferred procedure calls.

It could conceivably benefit from native coroutine language support, 
please read on.

> ...
> At first, I thought I could use threads inside the BGP process, but 
> following a few readings and a discussion with Adam Greenhalgh from the 
> UCL, I gave up: the underlying libraries are not thread safe, and making 
> them thread-safe would be enough to complete a master's thesis :-)
>   

Actually, it isn't out of the question -- it just involves working from 
the ground up, and this probably means gaining a lot of low-level 
familiarity with libxipc and libxorp.

Crossing process boundaries is complex and expensive, threads however 
come with the cost that more careful design consideration is needed.

Making libxorp thread-safe would not be a vastly difficult task. libxipc 
on the other hand would require careful thought; please see my posts on 
this subject over the last month or two here.

This is the book to get:

http://www.amazon.com/Programming-POSIX-Threads-David-Butenhof/dp/0201633922

Another approach is to consider native C++ shared memory approaches. 
Boost.Interprocess has this capability; it supplies STL-like container 
implementations which can be shared across multiple address spaces.

One of the things that I had slated for the current round of changes, 
was to change over to using Boost.ASIO, which is conceptually similar to 
stuff that's already in libxorp, and was designed to be thread-safe from 
the ground up.

I investigated this, and it would have meant a lot of changes to the 
current code base. You could use Boost.ASIO as a way of building a 
thread-safe XORP process.

Unfortunately, the changes to migrate to using Thrift as the underlying 
RPC implementation for XRL, are not currently complete -- Thrift itself 
has been ported on top of ASIO, and can integrate with its event loop.

It is worth looking at uC++, and Apple's Grand Central Dispatch, as a 
source of ideas (although GCD really requires a compiler with Blocks 
support, e.g. LLVM, which isn't up to C++0x yet. :-
    http://plg.uwaterloo.ca/~usystem/uC++.html
    http://en.wikipedia.org/wiki/Grand_Central_Dispatch
    http://llvm.org/

A native uC++ front-end for LLVM itself would be VERY interesting. The 
language itself is currently implemented as a cfront-like translator for 
GNU C++.  Thrift re-capitulates Java JVM monitors in its C++ library 
implementation to a degree...

It's also worth looking at Chris Kohlhoff's blog for some insightful 
input about Boost.ASIO:-
    http://blog.think-async.com/

> So, the other approach is to use multiple BGP processes instead, and to 
> load-balance according to the prefixes, among which the decision process 
> is independent.
>
> I'll add a dispatcher process before the pool of BGP processes.

To be honest, I don't believe this approach is likely to scale well. 
Crossing process boundaries is complex and expensive.

Most of the contention is on the BGP-RIB-FEA path. To some degree, this 
can mitigated by XRL batching, something which is in commercial XORP, 
but not community XORP [yet].

XRL RPC is still relatively expensive, and we pay a heavy price in 
libxipc for the Finder protocol being different.

Using another layer of XRL, or even other IPC, to achieve parallelism 
within the existing architecture, is likely to be prohibitively 
expensive, and probably blow away the benefits of scheduling across 
multiple cores.

Hope this helps,
BMS