[Xorp-hackers] RFC: Use one socket per interface for receiving packets in the FEA.

Thu Mar 6 09:34:10 PST 2008

Pavlin Radoslavov wrote:
> Ben,
>
> Now that you have (I presume) a working solution, can you get some
> numbers about the performance increase you can get with one socket
> per interface.
> I agree that once you have a large number of interfaces and large
> number of virtual XORP instances, the number of unnecessary packet
> delivery increases as O(V*I), but I still would like to see what the
> actual CPU savings are.
>   
It's hard to directly quantify, and it's also true that the hash lookup 
will also
directly improve the fea's packet reception logic (since it looks up the vif
by index for every received packet).

I have been running a 20 node (one xorp per node) scenario and a 30 
recently.
Before the per-interface socket and hashing fixes, the system load at 
'idle' was around 4.00
on my quad-core system with the 20 node scenario.

After the hashing fixes and the per-interface socket fixes, the load is 
about 0.10 on this
same system with the larger 30 node scenario.  Please note that without 
the hashing optimization, the 30 node
scenario will not even start due to fea taking too long.

I don't have numbers for hashing w/out the per-interface socket patch, 
but if you
are really interested, I'll disable my per-interface patch and run some 
tests.
> An even more interesting question would be to test those numbers
> with and without the pif_index->vif mapping optimization.
> Based on your profiling that indicates that the pif_index search
> uses lots of CPU, with the pif_index->vif mapping in place, I
> wouldn't be surprised if the CPU savings from the one socket per
> interface solution will be reduced.
>   
I am certain you are correct due to the vif lookup in the fea rx packet 
logic.
>
> Anyway, for the rest of the email I will assume that the savings are
> large enough to justify the extra modifications/complexity.
>
> It seems that your code will work only if the system supports
> SO_BINDTODEVICE (i.e., only Linux) which bothers me quite a bit.
> The alternative (OS-independent) solution would be to open a socket
> per IP address per interface. The argument for doing something like
> this is that typically the number of interfaces (both physical and
> virtual like tunnels) and the number of IP addresses have same order
> of magnitude (though I'd be interested to hear real-world examples
> where this is not the case).
>   
To be honest, I don't know so much how raw IP sockets work.  I do know that
the SO_BINDTODEVICE works on Linux, but I am not certain if binding to IPs
also works.  Also, I am not sure how to detect IP changes in the fea so 
that I can
properly re-bind on IP change.

Most of the logic should be the same whether using BINDTODEVICE or a local
IP binding.  We could change the code to not #ifdef on BINDTODEVICE but 
instead
an internal #if USE_PER_IF_SOCKETs and then set that #define locally for 
testing.
A windows user could try binding to IPs and see if that works...but I 
don't have a testbed
with other than Linux systems in it..
> Another issue I see is with handling the special multicast routing
> socket (it must have protocol type IGMP) and the handling of the
> regular IGMP socket for IGMP control traffic.
> On system like Linux, if you open two IGMP sockets and use one of
> them as the special multicast routing socket and the other one for
> regular IGMP control traffic, certain IGMP messages won't arrive on
> the regular IGMP socket. This is the reason that the MFEA has extra
> logic for handling the situation so a single IGMP socket is used for
> both purposes.
> However, if we have multiple IGMP sockets (one of them for multicast
> routing purpose and the rest of them using SO_BINDTODEVICE to bind
> to a specific interface), then I don't know whether we will still
> have problems with the delivery of IGMP control traffic.
> This is something that requires careful testing to find the answer.
>   
Ok, I have no idea about this either.  I haven't done any multicast 
routing testing,
just OSPF.
> It seems that some of your changes might step over some of Bruce's
> OLSR related changes, so from this aspect it also requires careful
> coordination.
>
> Said that, I think it will be premature to just take your patch and
> commit it now, because it will create more problems than it solves.
>
> However I don't want those changes to be lost in email.
> Hence, could you create a Bugzilla entry and add your patch to it so
> it can be easily located later.
> Also, please add two versions of your patch: one vs the current tree
> (i.e., the patch as you sent it to the list), and another one that
> contains only the socket-related delta, because there are changes in
> your patch that are for some earlier unrelated issues.
>   

Ok, sounds fair enough.... I'll send these patches in a day or two when 
I have
finished my testing and have teased out a patch for just the per-socket 
binding.

Thanks for the review.

Ben

> Thanks,
> Pavlin
>   

-- 
Ben Greear <greearb at candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com