[Xorp-users] Problems with Linux kernel and OSPF ???

Atanu Ghosh atanu at ICSI.Berkeley.EDU
Wed Dec 5 01:00:53 PST 2007


Hi,

The output that it would be good to see before and after the problem
occurs.
1) $ netstat -nr
2) Xorp> show interfaces
3) Xorp> show route table ipv4 unicast final
4) Xorp> show ospf4 neighbor detail
5) Xorp> show ospf4 database detail
6) $ print_lsas -S save.lsas
The print_lsas program can be found in ospf/tools directory. The program
stores the LSA database in a form that can be replayed.

You can also enable tracing in ospf:
        traceoptions {
            flag {
                all {
                    disable: false
                }
            }
        }

Which should show routes being added and deleted.

The latest code in CVS has a "clear ospf4 database" command, it would be
interesting to know if once the problem occurs if this solves the
problem.

It might also be interesting to keep the "ip mon" command running to
track routes being added and deleted.

Would it be possible at some off peak time to flap the ADSL link to see
if this replicates the problem. I know that you have stated that there
were no ADSL issues when the problem occurred, but I do wonder if we are
seeing some issue related to dynamic interfaces.

       Atanu.

>>>>> "Aidan" == Aidan Walton <awalton at wires3.net> writes:

    Aidan>    Hi, The adjacency runs over a wireless link between the
    Aidan> routers. It can, very possibly, drop in and out, but as far
    Aidan> as I can see this did not happen and to be honest in the 9
    Aidan> months I have had this system up I have never seen the
    Aidan> wireless link drop, but packet corruption could be a
    Aidan> possibility and this may be less easy to diagnose. It is a
    Aidan> high power 5.8GHz connection, here in the UK this is a
    Aidan> licensed band (and yes I have a license). So I don't think
    Aidan> interference is the likely cause, though I wouldn't rule this
    Aidan> out. If I look at the logs from the same period I seen
    Aidan> nothing to indicate the interface flapped, I would see the
    Aidan> wireless dis-associate and re-associate and cypher exchange
    Aidan> and this did not happen. But as I say there could be a period
    Aidan> of high BER on the links. I thought ospf would handle this
    Aidan> reasonably gracefully? I have to say heavy BER was not
    Aidan> evident when I came to repair the network, or at least I
    Aidan> didn't detect it and in the past I have run ospf over another
    Aidan> one of my wireless links with stations 10km apart with the
    Aidan> wireless link almost non-functional, dropping packets left
    Aidan> right and centre and re-associating over and over, but xorp's
    Aidan> ospf never complained!  I was beginning to suspect that this
    Aidan> was related to my adsl link on the suspect router, as this is
    Aidan> a dynamic interface and I have this defined independently of
    Aidan> xorp. If this interface flaps then the default route
    Aidan> associated with the adsl ppp session is withdrawn. The
    Aidan> default from the adsl line is not propagated into ospf
    Aidan> though, instead I use a static default with a higher metric
    Aidan> pointed at the loopback and inject this into ospf
    Aidan> instead. Then the flaps of the adsl line do not cause churn
    Aidan> in the ospf domain. I was starting to think that the addition
    Aidan> and removal of the default from the adsl line was affecting
    Aidan> the kernel table and this was upsetting xorp's ospf. However
    Aidan> this morning when this happened the adsl line was stable. As
    Aidan> far as my logs look it suddenly decided to stop functioning
    Aidan> with no correlated events from other system processes. The
    Aidan> only things in the logs at the same time is iptables dropping
    Aidan> DOS attacks, but this in normal, unfortunately far to normal.
    Aidan> show ospf4 neighbour simply stated 'full' there is only one
    Aidan> neighbour defined on this router. I didn't look this time at
    Aidan> show interfaces, but from memory of the last time this
    Aidan> happened this also was normal.  The problem is that these
    Aidan> routers are mounted 10m high up telegraph poles. If I loose
    Aidan> connectivity it requires a ladder and a climbing harness to
    Aidan> get at them, this is not to mention my upset customers who,
    Aidan> as is normal with customers, do not delay in telling me they
    Aidan> have lost their Internet links.  I suppose what I'm trying to
    Aidan> understand is how to be best prepared for next time, logging,
    Aidan> processes and checks during the failure period to grab as
    Aidan> much useful info before I am forced to restart xorp and get
    Aidan> my customers up and running again. This is a very short
    Aidan> period I have to say. I have a small group of business units
    Aidan> supported on this router and all hell breaks loose if this
    Aidan> happens during working hours.  How can I get the maximum
    Aidan> logging info from the xorp processes?  Anything I can do in
    Aidan> order that you can help me, will be dutifully carried
    Aidan> out. What next, any suggestions?  Thanks Aidan I will On Tue,
    Aidan> 2007-12-04 at 12:19 -0800, Atanu Ghosh wrote:

    Atanu> Hi,

    Atanu> The scenario that you describe would be perfectly normal if
    Atanu> the connectivity between the "suspect" router and the
    Atanu> "adjacent" router is lost. Although I would expect the "show
    Atanu> ospf4 neighbor" to show the state of the adjacency to be
    Atanu> "Down" not "Full". When an OSPF router loses its adjancencies
    Atanu> the LSA database will slowly timeout, however, the routes
    Atanu> will be withdrawn as soon as the adjacencies are lost.

    Atanu> We will require more information to diagnose the problem next
    Atanu> time the problem occurs the output of "show interfaces" and
    Atanu> "show ospf4 neighbor" would be very useful.

    Atanu> XORP tracks the state of interfaces in particular the carrier
    Atanu> state. If OSPF believes that the Ethernet has been
    Atanu> disconnected it will stop attempting to send hello
    Atanu> packets. Is it possible that there is a problem with an
    Atanu> interface or cable between the two routers?

    Atanu> Atanu.

>>>>> "Aidan" == Aidan Walton <awalton at wires3.net> writes:

    Aidan> Hi All, I am using xorp in a production environment,
    Aidan> admittedly a small one. I operate a local WISP and xorp is
    Aidan> running on my wireless nodes. I have a very simple
    Aidan> configuration and really I could probably get away with
    Aidan> static routing throughout the entire network, but I wanted to
    Aidan> try xorp and see just how stable it was. However as I expand
    Aidan> the network I am having second thoughts. It is not good at
    Aidan> all when a network goes up in smoke and I can't explain why
    Aidan> or predict when and what the causes are.  The network has
    Aidan> been in operation 24x7 for around 9 months. I am running on a
    Aidan> Linux kernel 2.6.18-4 and for the vast majority of the time I
    Aidan> have no issues. However now for the fourth time I see the
    Aidan> same problem: Suddenly the Linux kernel and the xorp rib
    Aidan> become detached. Normally all routes in the kernel match
    Aidan> those that xorp is generating, receiving and electing as
    Aidan> active. I am running OSPF and the neighbour states remain
    Aidan> 'full' throughout but if I am not mistaken I see ospf hellos
    Aidan> only in one direction (i.e nothing being transmitted from the
    Aidan> router I suspect). The lsdb of OSPF on the suspect and
    Aidan> adjacent routers contain all the routes but they are aging
    Aidan> out slowly on the adjacent router. When I look at the kernel
    Aidan> routes those from OSPF have already vanished.  I can see the
    Aidan> ospf process running on the offending router? and again I can
    Aidan> see the ospf lsdb intact and correct. When I restart xorp the
    Aidan> system recovers and the routes appear in the kernel again. I
    Aidan> suspect a problem with ospf. I tried enabling traceoptions on
    Aidan> the ospf process, but in fact I needed to restart all the
    Aidan> xorp processes before this actually became active. I now have
    Aidan> this running so if/when it happens again I might be able to
    Aidan> offer some more information.  Does anyone have any experience
    Aidan> of ospf begin unstable? any suggestions how I might more
    Aidan> effectively capture some logs from this event. I do not see
    Aidan> any options for logging the fea process. Is there anything I
    Aidan> can enable to help diagnose the issue?  Many thanks, and of
    Aidan> course cheers for the code in the first place.  Aidan
    Aidan> _______________________________________________ Xorp-users
    Aidan> mailing list Xorp-users at xorp.org
    Aidan> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-users



More information about the Xorp-users mailing list