[Xorp-users] OSPF terminated with signal 6.

Atanu Ghosh atanu at ICSI.Berkeley.EDU
Fri Mar 21 09:08:47 PDT 2008


Hi,

Let me explain in *general* terms how this code works.

Each area in OSPF contains an LSA database for the purposes of this
discussion there are two kinds of LSAs ones that the router generated
and ones that it received. Each LSA contains a single timer if the LSA
was received from another router them the timer is set to MaxAge
(default 1 hour), so if after MaxAge the LSA has not been refreshed it
will be removed. If the router is generating an LSA itself the timer
is set to LSRefreshTime (default 30 minutes).

The LSAs database is stored in a STL vector in effect an array there
are a number of requirements such as database exchange that made this
a reasonable choice. There is a single method to add an LSA to the
database and a single method to remove an LSA from the database. When
a LSA is removed from the database then the timer associated with it
should be unconditionally cancelled, my previous fix closed a hole
where under special circumstances the timer would not be cancelled.

As a sanity check when a timer is primed it is given its index in the
database. What you are seeing is the MaxAge timer firing on an LSA
that is no longer in the database, which should not be possible. The
first test verifies that the LSA is still in the database, which it
isn't and then the second test verifies that the LSA is at the correct
position which obviously it isn't.

In order to keep OSPF running your best bet is to just print a warning
and return from this method when the LSA is not found in the database,
all the other code in this method is basically to remove the LSA from
the database and notify the neighbours that this LSA should be
removed. The code that notifies that neigbours also contains sanity
checks that will probably be tripped.

An LSA with an AGE of MaxAge has to be handled specially so the
problem probably lies in this special handling in an other part of the
code. The way to remove an LSA from the OSPF database is to send it
out with an AGE of MaxAge, there may be a problem with the handling of
an incoming LSA with AGE set to MaxAge.

When I find something I'll send you a patch to try.

   Atanu.

>>>>> "Ben" == Ben Greear <greearb at candelatech.com> writes:

    Ben> Atanu Ghosh wrote:
    >> Hi,
    >> Could you try this fix?

    Ben> Ok, I tried again, and this time I'm quite sure I have your
    Ben> patch applied properly.  I still get the assert.

    Ben> I modified the code to just print a warning and then goto the bottom
    Ben> of the method.  I'm not sure if my patch is right or not, but it doesn't
    Ben> help much.  See farther down for log messages and a new assert relating
    Ben> to index mismatch.

    Ben> [area_router.cc]
    Ben> @@ -2702,8 +2702,10 @@

    Ben> XLOG_ASSERT(!lsar->external());

    Ben> -    if (!find_lsa(lsar, index))
    Ben> -       XLOG_FATAL("LSA not in database: %s", cstring(*lsar));
    Ben> +    if (!find_lsa(lsar, index)) {
    Ben> +       XLOG_WARNING("LSA not in database: %s", cstring(*lsar));
    Ben> +       goto out;
    Ben> +    }

    Ben> if (i != index)
    Ben> XLOG_FATAL("Indexes don't match %u != %u %s",  XORP_UINT_CAST(i),
    Ben> @@ -2726,6 +2728,7 @@
    Ben> #endif
    Ben> publish_all(lsar);

    Ben> +  out:
    Ben> // Clear the timer otherwise there is a circular dependency.
    Ben> // The LSA contains a XorpTimer that points back to the LSA.
    lsar-> get_timer().clear();


    Ben> I'm not sure if this is due to my patch above, or if there are also index
    Ben> issues.

    Ben> [ 2008/03/20 17:26:34  WARNING xorp_ospfv2:26077 OSPF area_router.cc:2706 maxage_reached ] LSA not in database: Network-LSA:
    Ben> LS age 3600 Options  0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID 10.25.28.28 Advertising Router 127.1.0.28 LS sequence number 0x80000001 LS checksum 0xda51 length 32
    Ben> Network Mask 0xffffff00
    Ben> Attached Router 127.1.0.28
    Ben> Attached Router 127.1.0.25
    Ben> [ 2008/03/20 17:26:34  FATAL xorp_ospfv2:26077 OSPF area_router.cc:2713 maxage_reached ] Indexes don't match 167 != 141 Network-LSA:
    Ben> LS age   90 Options  0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID 10.24.28.28 Advertising Router 127.1.0.28 LS sequence number 0x80000001 LS checksum 0xd855 length 32
    Ben> Network Mask 0xffffff00
    Ben> Attached Router 127.1.0.28
    Ben> Attached Router 127.1.0.24

    Ben> This appears to happen when lots of xorp nodes are being joined together
    Ben> (ie, interfaces added connecting directly to other xorp routers).

    Ben> Let me know if I can get you more debugging info.

    Ben> Thanks,
    Ben> Ben

    Ben> -- 
    Ben> Ben Greear <greearb at candelatech.com>
    Ben> Candela Technologies Inc  http://www.candelatech.com



More information about the Xorp-users mailing list