[Bulk] [Xorp-hackers] OSPF Failures

Mike Horn caddisconsulting@yahoo.com
Wed, 8 Feb 2006 09:28:47 -0700


This is a multi-part message in MIME format.

------=_NextPart_000_00E9_01C62C92.108002F0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: 7bit

Hi Tim,
 
This looks similar to, but is not the same as XORP bug 383.  I would suggest
opening a new bug in Bugzilla so that the XORP team can investigate.
 
If these are the only routers running OSPF on this network, then there might
be some sort of race condition related to DR election.  This might explain
why it happens when you enable all four at the same time rather than one at
a time.  I only have 2 routers running XORP here, but I'll see if I can
reproduce the issue.
 
-mike

  _____  

From: xorp-hackers-admin@icir.org [mailto:xorp-hackers-admin@icir.org] On
Behalf Of Tim Durack
Sent: Tuesday, February 07, 2006 7:53 PM
To: xorp-hackers@xorp.org
Subject: [Bulk] [Xorp-hackers] OSPF Failures


I've been having ongoing OSPF death problems with my simple fully meshed
four router test bed.

If I bring all four routers up at the same time, OSPF quickly dies. If I
bring them up one at a time, sometimes they will stay up and stable. 

The event is:

[ 2006/02/07 21:11:40  INFO xorp_rtrmgr:7256 RTRMGR +2228 task.cc run_task ]
No more tasks to run
[ 2006/02/07 21:11:40 TRACE xorp_rtrmgr RTRMGR ] apply_config_change_done:
status: 1 response:  target: xorpsh-7257-xen1 
[ 2006/02/07 21:12:45 TRACE xorp_ospfv2 OSPF ] Ack for LSA not in
retransmission list.
LS age    1 Options  0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x1 Link
State ID 10.0.0.1 Advertising Router 10.0.0.1 LS sequence number 0x80000001
LS checksum 0xcde5 leng th 72
Link State Acknowledgement Packet:
        Version 2
        Type 5
        Router ID 10.1.0.2
        Area ID 0.0.0.0
        Auth Type 0

        LS age    1 Options  0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x1
Link State ID 10.0.0.1  <http://10.0.0.1> Advertising Router 10.0.0.1 LS
sequence number 0x80000001 LS checksum 0xc de5 length 72
[ 2006/02/07 21:14:16 TRACE xorp_ospfv2 OSPF ] Ack for LSA not in
retransmission list.
LS age   91 Options  0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link
State ID 10.1.0.13 Advertising Router 10.0.0.1 LS sequence number 0x80000001
LS checksum 0xdc45 len gth 32
Link State Acknowledgement Packet:
        Version 2
        Type 5
        Router ID 10.0.0.2
        Area ID 0.0.0.0
        Auth Type 0

        LS age   91 Options  0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2
Link State ID 10.1.0.13 Advertising Router 10.0.0.1 LS sequence number
0x80000001 LS checksum 0x dc45 length 32
[ 2006/02/07 21:14:21  FATAL xorp_ospfv2:7261 OSPF +471 routing_table.cc
add_entry ] Assertion (0 == _entries.count(area)) failed 
[ 2006/02/07 21:14:21  ERROR xorp_rtrmgr:7256 RTRMGR +736 module_manager.cc
done_cb ] Command "/usr/local/xorp/ospf/xorp_ospfv2": terminated with signal
6.
[ 2006/02/07 21:14:21  INFO xorp_rtrmgr:7256 RTRMGR +286 module_manager.cc
module_exited ] Module abnormally killed: ospf4 
[ 2006/02/07 21:14:21 INFO xorp_rib RIB ] Received death event for protocol
ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2006/02/07 21:14:21 INFO xorp_rib RIB ] Received death event for protocol
ospfv2 shutting down ------- 
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2006/02/07 21:14:21 INFO xorp_rib RIB ] Received death event for protocol
ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf
[ 2006/02/07 21:14:21 INFO xorp_rib RIB ] Received death event for protocol
ospfv2 shutting down -------
OriginTable: ospf
IGP
next table = Redist:ospf


I've attached all four router configs in case it helps. 

Any ideas?

Tim:>


------=_NextPart_000_00E9_01C62C92.108002F0
Content-Type: text/html;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META http-equiv=3DContent-Type content=3D"text/html; =
charset=3Dus-ascii">
<META content=3D"MSHTML 6.00.2900.2802" name=3DGENERATOR></HEAD>
<BODY>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>Hi Tim,</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>This looks similar to, but is not the same as =
XORP bug=20
383.&nbsp; I would suggest opening a new bug in Bugzilla so that the =
XORP team=20
can investigate.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>If these are the only routers running OSPF on =
this network,=20
then there might be some sort of race condition related to DR =
election.&nbsp;=20
This might explain why it happens when you enable all four at the same =
time=20
rather than one at a time.&nbsp; I only have 2 routers running XORP =
here, but=20
I'll see if I can reproduce the issue.</FONT></SPAN></DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2></FONT></SPAN>&nbsp;</DIV>
<DIV dir=3Dltr align=3Dleft><SPAN class=3D625002116-08022006><FONT =
face=3DArial=20
color=3D#0000ff size=3D2>-mike</FONT></SPAN></DIV><BR>
<DIV class=3DOutlookMessageHeader lang=3Den-us dir=3Dltr align=3Dleft>
<HR tabIndex=3D-1>
<FONT face=3DTahoma size=3D2><B>From:</B> xorp-hackers-admin@icir.org=20
[mailto:xorp-hackers-admin@icir.org] <B>On Behalf Of </B>Tim=20
Durack<BR><B>Sent:</B> Tuesday, February 07, 2006 7:53 PM<BR><B>To:</B>=20
xorp-hackers@xorp.org<BR><B>Subject:</B> [Bulk] [Xorp-hackers] OSPF=20
Failures<BR></FONT><BR></DIV>
<DIV></DIV>I've been having ongoing OSPF death problems with my simple =
fully=20
meshed four router test bed.<BR><BR>If I bring all four routers up at =
the same=20
time, OSPF quickly dies. If I bring them up one at a time, sometimes =
they will=20
stay up and stable. <BR><BR>The event is:<BR><BR>[ 2006/02/07 =
21:11:40&nbsp;=20
INFO xorp_rtrmgr:7256 RTRMGR +2228 task.cc run_task ] No more tasks to =
run<BR>[=20
2006/02/07 21:11:40 TRACE xorp_rtrmgr RTRMGR ] apply_config_change_done: =
status:=20
1 response:&nbsp; target: xorpsh-7257-xen1 <BR>[ 2006/02/07 21:12:45 =
TRACE=20
xorp_ospfv2 OSPF ] Ack for LSA not in retransmission list.<BR>LS=20
age&nbsp;&nbsp;&nbsp; 1 Options&nbsp; 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 =
LS type=20
0x1 Link State ID <A href=3D"http://10.0.0.1">10.0.0.1</A> Advertising =
Router <A=20
href=3D"http://10.0.0.1">10.0.0.1</A> LS sequence number 0x80000001 LS =
checksum=20
0xcde5 leng th 72<BR>Link State Acknowledgement=20
Packet:<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Version=20
2<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Type=20
5<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Router ID <A=20
href=3D"http://10.1.0.2">10.1.0.2</A><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;=20
Area ID <A=20
href=3D"http://0.0.0.0">0.0.0.0</A><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;=20
Auth Type 0<BR><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LS=20
age&nbsp;&nbsp;&nbsp; 1 Options&nbsp; 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 =
LS type=20
0x1 Link State ID <A href=3D"http://10.0.0.1">10.0.0.1 </A>Advertising =
Router <A=20
href=3D"http://10.0.0.1">10.0.0.1</A> LS sequence number 0x80000001 LS =
checksum=20
0xc de5 length 72<BR>[ 2006/02/07 21:14:16 TRACE xorp_ospfv2 OSPF ] Ack =
for LSA=20
not in retransmission list.<BR>LS age&nbsp;&nbsp; 91 Options&nbsp; 0x2 =
DC: 0 EA:=20
0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link State ID <A=20
href=3D"http://10.1.0.13">10.1.0.13</A> Advertising Router <A=20
href=3D"http://10.0.0.1">10.0.0.1</A> LS sequence number 0x80000001 LS =
checksum=20
0xdc45 len gth 32<BR>Link State Acknowledgement=20
Packet:<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Version=20
2<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Type=20
5<BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; Router ID <A=20
href=3D"http://10.0.0.2">10.0.0.2</A><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
bsp;&nbsp;=20
Area ID <A=20
href=3D"http://0.0.0.0">0.0.0.0</A><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs=
p;&nbsp;=20
Auth Type 0<BR><BR>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; LS =
age&nbsp;&nbsp;=20
91 Options&nbsp; 0x2 DC: 0 EA: 0 N/P: 0 MC: 0 E: 1 LS type 0x2 Link =
State ID <A=20
href=3D"http://10.1.0.13">10.1.0.13</A> Advertising Router <A=20
href=3D"http://10.0.0.1">10.0.0.1</A> LS sequence number 0x80000001 LS =
checksum 0x=20
dc45 length 32<BR>[ 2006/02/07 21:14:21&nbsp; FATAL xorp_ospfv2:7261 =
OSPF +471=20
routing_table.cc add_entry ] Assertion (0 =3D=3D _entries.count(area)) =
failed <BR>[=20
2006/02/07 21:14:21&nbsp; ERROR xorp_rtrmgr:7256 RTRMGR +736 =
module_manager.cc=20
done_cb ] Command "/usr/local/xorp/ospf/xorp_ospfv2": terminated with =
signal=20
6.<BR>[ 2006/02/07 21:14:21&nbsp; INFO xorp_rtrmgr:7256 RTRMGR +286=20
module_manager.cc module_exited ] Module abnormally killed: ospf4 <BR>[=20
2006/02/07 21:14:21 INFO xorp_rib RIB ] Received death event for =
protocol ospfv2=20
shutting down -------<BR>OriginTable: ospf<BR>IGP<BR>next table =3D=20
Redist:ospf<BR>[ 2006/02/07 21:14:21 INFO xorp_rib RIB ] Received death =
event=20
for protocol ospfv2 shutting down ------- <BR>OriginTable: =
ospf<BR>IGP<BR>next=20
table =3D Redist:ospf<BR>[ 2006/02/07 21:14:21 INFO xorp_rib RIB ] =
Received death=20
event for protocol ospfv2 shutting down -------<BR>OriginTable:=20
ospf<BR>IGP<BR>next table =3D Redist:ospf<BR>[ 2006/02/07 21:14:21 INFO =
xorp_rib=20
RIB ] Received death event for protocol ospfv2 shutting down=20
-------<BR>OriginTable: ospf<BR>IGP<BR>next table =3D =
Redist:ospf<BR><BR><BR>I've=20
attached all four router configs in case it helps. <BR><BR>Any=20
ideas?<BR><BR>Tim:&gt;<BR></BODY></HTML>

------=_NextPart_000_00E9_01C62C92.108002F0--