[Xorp-users] Problems with Linux kernel and OSPF ???

Aidan Walton awalton at wires3.net
Tue Dec 4 14:54:36 PST 2007


Hi,
The adjacency runs over a wireless link between the routers. It can,
very possibly, drop in and out, but as far as I can see this did not
happen and to be honest in the 9 months I have had this system up I have
never seen the wireless link drop, but packet corruption could be a
possibility and this may be less easy to diagnose. It is a high power
5.8GHz connection, here in the UK this is a licensed band (and yes I
have a license). So I don't think interference is the likely cause,
though I wouldn't rule this out. If I look at the logs from the same
period I seen nothing to indicate the interface flapped, I would see the
wireless dis-associate and re-associate and cypher exchange and this did
not happen. But as I say there could be a period of high BER on the
links. I thought ospf would handle this reasonably gracefully? I have to
say heavy BER was not evident when I came to repair the network, or at
least I didn't detect it and in the past I have run ospf over another
one of my wireless links with stations 10km apart with the wireless link
almost non-functional, dropping packets left right and centre and
re-associating over and over, but xorp's ospf never complained!

I was beginning to suspect that this was related to my adsl link on the
suspect router, as this is a dynamic interface and I have this defined
independently of xorp. If this interface flaps then the default route
associated with the adsl ppp session is withdrawn. The default from the
adsl line is not propagated into ospf though, instead I use a static
default with a higher metric pointed at the loopback and inject this
into ospf instead. Then the flaps of the adsl line do not cause churn in
the ospf domain. I was starting to think that the addition and removal
of the default from the adsl line was affecting the kernel table and
this was upsetting xorp's ospf. However this morning when this happened
the adsl line was stable. As far as my logs look it suddenly decided to
stop functioning with no correlated events from other system processes.
The only things in the logs at the same time is iptables dropping DOS
attacks, but this in normal, unfortunately far to normal.

show ospf4 neighbour simply stated 'full' there is only one neighbour
defined on this router. I didn't look this time at show interfaces, but
from memory of the last time this happened this also was normal.

The problem is that these routers are mounted 10m high up telegraph
poles. If I loose connectivity it requires a ladder and a climbing
harness to get at them, this is not to mention my upset customers who,
as is normal with customers, do not delay in telling me they have lost
their Internet links.

I suppose what I'm trying to understand is how to be best prepared for
next time, logging, processes and checks during the failure period to
grab as much useful info before I am forced to restart xorp and get my
customers up and running again. This is a very short period I have to
say. I have a small group of business units supported on this router and
all hell breaks loose if this happens during working hours.

How can I get the maximum logging info from the xorp processes?

Anything I can do in order that you can help me, will be dutifully
carried out. What next, any suggestions?
Thanks
Aidan


I will On Tue, 2007-12-04 at 12:19 -0800, Atanu Ghosh wrote:

> Hi,
> 
> The scenario that you describe would be perfectly normal if the
> connectivity between the "suspect" router and the "adjacent" router is
> lost. Although I would expect the "show ospf4 neighbor" to show the
> state of the adjacency to be "Down" not "Full". When an OSPF router
> loses its adjancencies the LSA database will slowly timeout, however,
> the routes will be withdrawn as soon as the adjacencies are lost.
> 
> We will require more information to diagnose the problem next time the
> problem occurs the output of "show interfaces" and "show ospf4 neighbor"
> would be very useful.
> 
> XORP tracks the state of interfaces in particular the carrier state. If
> OSPF believes that the Ethernet has been disconnected it will stop
> attempting to send hello packets. Is it possible that there is a problem
> with an interface or cable between the two routers?
> 
> 	   Atanu.
> 
> >>>>> "Aidan" == Aidan Walton <awalton at wires3.net> writes:
> 
>     Aidan>    Hi All, I am using xorp in a production environment,
>     Aidan> admittedly a small one. I operate a local WISP and xorp is
>     Aidan> running on my wireless nodes. I have a very simple
>     Aidan> configuration and really I could probably get away with
>     Aidan> static routing throughout the entire network, but I wanted to
>     Aidan> try xorp and see just how stable it was. However as I expand
>     Aidan> the network I am having second thoughts. It is not good at
>     Aidan> all when a network goes up in smoke and I can't explain why
>     Aidan> or predict when and what the causes are.  The network has
>     Aidan> been in operation 24x7 for around 9 months. I am running on a
>     Aidan> Linux kernel 2.6.18-4 and for the vast majority of the time I
>     Aidan> have no issues. However now for the fourth time I see the
>     Aidan> same problem: Suddenly the Linux kernel and the xorp rib
>     Aidan> become detached. Normally all routes in the kernel match
>     Aidan> those that xorp is generating, receiving and electing as
>     Aidan> active. I am running OSPF and the neighbour states remain
>     Aidan> 'full' throughout but if I am not mistaken I see ospf hellos
>     Aidan> only in one direction (i.e nothing being transmitted from the
>     Aidan> router I suspect). The lsdb of OSPF on the suspect and
>     Aidan> adjacent routers contain all the routes but they are aging
>     Aidan> out slowly on the adjacent router. When I look at the kernel
>     Aidan> routes those from OSPF have already vanished.  I can see the
>     Aidan> ospf process running on the offending router? and again I can
>     Aidan> see the ospf lsdb intact and correct. When I restart xorp the
>     Aidan> system recovers and the routes appear in the kernel again. I
>     Aidan> suspect a problem with ospf. I tried enabling traceoptions on
>     Aidan> the ospf process, but in fact I needed to restart all the
>     Aidan> xorp processes before this actually became active. I now have
>     Aidan> this running so if/when it happens again I might be able to
>     Aidan> offer some more information.  Does anyone have any experience
>     Aidan> of ospf begin unstable? any suggestions how I might more
>     Aidan> effectively capture some logs from this event. I do not see
>     Aidan> any options for logging the fea process. Is there anything I
>     Aidan> can enable to help diagnose the issue?  Many thanks, and of
>     Aidan> course cheers for the code in the first place.  Aidan
>     Aidan> _______________________________________________ Xorp-users
>     Aidan> mailing list Xorp-users at xorp.org
>     Aidan> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-users
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-users/attachments/20071204/c3c9d070/attachment.html 


More information about the Xorp-users mailing list