[Xorp-hackers] xorp-ospf performance issues: busy-spins & packet floods.

Ben Greear greearb at candelatech.com
Fri Oct 19 22:19:37 PDT 2007

Pavlin Radoslavov wrote:
> Ben Greear <greearb at candelatech.com> wrote:
>> So, after I got fea to quit crashing, I was able to do some tests with
>> dynamic interfaces & OSPF.
>> With 15 routers, the system load goes to ~20 and ospf cannot seem to
>> get out of init.
>> I notice that ospf and fea are taking large amounts of CPU during
>> and after xorpsh activity.
>> I did an strace on xorp_ospf2 and it looks like it is sitting in
>> a loop doing selects but not actually reading/writing the descriptors
>> for much of the time.
>> If I remember correctly, this was the same problem that xorpsh showed
>> before Pavlin put the fix in to sleep for 10ms.  Maybe a similar
>> fix is needed for ospf2?
> The xorpsh fix is semi-hackish and xorpsh-specific so it shouldn't
> be applied to OSPF or FEA.
It seems the current eventloop logic is too easy to use incorrectly and 
busy spins.
>> Even then, it seems that we should not need to busy-spin even at 10ms.
>> We should be able to set a longer timeout and wait on select to tell
>> us when we have messages and/or can send data.
> The eventloop is used to handle different events (I/O, timers,
> tasks) and in the process it is calling select(2) probably more than
> necessary. E.g., it needs to find the event with the highest
> priority and the login for doing that probably could be optimized to
> reduce some of the system calls.
> This is probably the reason why you see the FEA calling select(2)
> several times, but this is just a speculation.
I've had good luck in other applications doing something like:

while (1) {
    clear_fds, set timeout large, maxdesc to 0
    //let each module recursively set fds it's interested in
    MainObjectCollection::instance().setFds(&input_set, &output_set, 
                                     maxdesc, sleep_for, now);
    call select() on the fds with appropriate 'sleep_for' timeout
    // when done, we have either hit a timeout (ie, timer fired)
    // or we have IO ready.  Pass in 'now' so that objects don't have to 
be doing system calls to get the time so often.
   now  = getCurTime();
   MainObjectCollection::instance().tick(&input_set, &output_set, 
&exc_set, now);

Even if you keep to your current architecture, I can't see any reason to 
call select if you
are not going to read the descriptors immediately after the select.  If 
you just want to sleep
and not read fds, just send in null pointers to select() and it will 
sleep appropriately (within ~10ms accuracy or so).


Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com

More information about the Xorp-hackers mailing list