[Xorp-hackers] xrl calls from xorp_ospf to xorp_fea start failing when generating lots of calls.

Ben Greear greearb at candelatech.com
Tue Mar 13 10:35:33 PDT 2012


On 03/13/2012 07:17 AM, Markus Zehnder wrote:
> Hi there,
>
> In my setup i have  a problem in ospf/ospf.cc Ospf<A>::transmit().
> Around line 333, the call to _io->send() returns 'false'.
>
>
> Setup: two linux boxes (virtual) with xorp 1.8.5, configured to run OSPF, both boxes in the same area 0.0.0.0. The two machines
> are connected with two links on different interfaces. The neighbor state of the interfaces is "FULL".
> OSPF is configured to distribute the static routes.
>
> Scenario to trigger above problem:
> - Modify the configfile of one box and add somewhat around 100 static routes.
> - Reload the configfile with "load <file>" in xorpsh.
> Result:
> - only about 60 to 70 routes get distributed with the first try. rest comes later (with the retransmit when it works (see #109))

Thanks for the bug and the patch.  I'm going to look over that now.

> Debugging:
> - ospf is trying to send LSA Updates, one packet for each route and interface. The packets are passed to xorp_fea as
>    mentioned at the top with the transmit() method. ..but that fails after 120..140 packets.
>
> Can anybody give me some hint how that intented to work ? Is there some queue overflowing in the xorp_fea? should
> xorp_fea run with higher priority then the xorp_ospf.

Probably whatever is sending needs to queue and retry sends
that fail (and preserve ordering, most likely..so after first failure,
start queueing or otherwise stop sending pkts, set a short timer, and try
again soon).  XRL is tricky code to deal with, so it's usually easier
to hack around it's deficiencies than try to fix the core code to be
more flexible.  But, I'd welcome patches that make the core more
robust!

> I guess i do not yet understand the XRL stuff good enough...
>
> Don't tell me that my scenario is not realistic. I try to reconfigure xorp with a reload of the config file because that
> should be faster than a complete restart. And i would like to know the limits..when does it work and under which
> condition will it fail.

If anyone ever tries to ignore a bug by complaining about unrealistic scenarios,
then yell at them loudly :)

But, that said..you may have to be the one to fix it...

Thanks,
Ben


-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the Xorp-hackers mailing list