[Xorp-hackers] Using Netlink to lookup forwarding entries in Linux kernel
Pavlin Radoslavov
pavlin@icir.org
Tue, 21 Oct 2003 12:54:16 -0700
I was playing with testing the FEA when adding/deleting unicast
forwarding entries in the kernel, and I found the following problems
when I use the Netlink mechanism with Linux kernel:
* The kernel doesn't appear to support looking-up of a subnet
address from user space.
Example:
If we install a route entry for 10.30.0.0/16 in the
kernel, and we send a request to the kernel to lookup subnet
10.30.0.0/16, we would expect that the kernel will return a
netlink message that contains the previously installed
information. E.g., the returned info should contain at least the
subnet mask length of 24. Instead, the kernel returns the result
of looking-up address 10.30.0.0/32 which could be different from
entry 10.30.0.0/16 (e.g, it could based on the info of the more
specific entry 10.30.0.0/24).
BTW, the returned 10.30.0.0/32 entry is "cloned" (more on this
subject later).
Only if we fetch the whole forwarding table from the kernel, then
the information for each entry matches the information when it was
installed.
After reading the source code for iproute2 (which contains utility
"ip" and is presumably the example of how to use the Netlink
interface), I found that the way it supports a command like
"ip route list exact 10.30.0.0/24" which basically lookup the
exact network routing entry is to:
1. Get the whole forwarding table
2. Go through the list of all entries, and select the one that
exactly matches the request (if such entry exists).
Obviously, the overhead of always fetching the forwarding table
and then filtering/selecting at user space may increase
considerably if the forwarding table size becomes significantly
large.
However, I coudn't find any other solution of the problem (no
documentation, and reading the source code and playing with
the Netlink interface were frutless), hence I had to use the same
mechanism inside the FEA.
On the upside, looking-up a specific subnet address currently is
needed only for debug purpose, hence I don't expect that we
should worry much about the overhead of reading the whole
forwarding table.
* When we lookup a host address from user space, the returned result
is actually a "cloned" entry inside the kernel. E.g., if we have
installed 10.30.0.0/16 in the kernel, and we lookup destination
address 10.30.0.10, the kernel will create internally a cloned
entry for 10.30.0.10/32, and will return that result.
However, if we delete entry 10.30.0.0/16, it looks like that the
cloned 10.30.0.10/32 entry will remain in the kernel for up to 2
seconds or so, and then it will be automatically deleted.
Hence, if I try to perform the following test:
1. Install routing entry for 10.30.0.0/16
2. Test that the kernel returns a valid route for destination
10.30.0.10
3. Delete routing entry for 10.30.0.0/16
4. Test that the kernel does NOT have a valid route for
destination 10.30.0.10 (assuming no other matching entries were
installed previously in the kernel).
the test will fail at step 4, because the kernel will return the
obsoleted cloned entry for 10.30.0.10/32.
Only if I wait at least 2-3 seconds between step 3 and step 4,
then the test will succeed.
For now I don't have a reasonable solution of the problem except
that to explicitly modify the above test such that in step 4 we
always wait for 3 seconds first before sending the request to the
FEA.
However, looking-up a specific host address currently is used only
for debugging purpose, so I expect that the above behavior
would create problems only in our test scripts (solvable by the
"sleep 3" hack).
FWIW, in *BSD the routing sockets interface doesn't appear to have
the above problems.
Any comments or suggestions if we should handle those problems
differently?
Pavlin