[Xorp-users] BGP crash

Mark Handley M.Handley at cs.ucl.ac.uk
Thu Dec 6 14:00:30 PST 2007


Thanks - that should be sufficient to build a test suite case.

The crash is one where BGP is sending a route to a peer (not sure
which yet) and detects internally that it's already sent a version of
the same route without (internally) withdrawing that route first.
This triggers an internal sanity check because this should never
happen, and BGP aborts to preserve the evidence of the bug.  In
general these cases are hard to debug, but this one shouldn't be too
bad as you've given us a lot to go on.

Thank you!

 - Mark

On Dec 6, 2007 11:22 AM, Arsi Antila <bbb999 at zerodistance.fi> wrote:
> The BGP crash situation was tested using both Linux/CentOS/XORP 1.4 and
> Linux/Debian/Debian XORP package 1.5~cvs.20070824-1, and it seems to occur in
> both environments. The key to get BGP to crash seems to be a combination of a
> policy rule (even a simple one), a network like the one described below (or
> similar), and flapping BGP peers.
>
> The test network has DUT (device under test, XORP) with eth1 connected to both
> Router1 (AS 1) and Router2 (AS 2). DUT eth2 port is connected to Router3 (AS 1)
> and Router4 (AS 2).
>
> Test sequence to get XORP BGP to crash is: start routers 1-4, start XORP, make
> Router2 go down, make Router2 go up. After this XORP crashes.
>
> Here is XORP output for the crash:
>
>
> [ 2007/12/05 13:31:18 INFO xorp_bgp BGP ] Peer-{10.10.20.20(179)
> 10.10.20.40(179)} in state ESTABLISHED(6) received Notification Packet:
> Cease(6)
> [ 2007/12/05 13:31:18 INFO xorp_bgp BGP ] Peer-{10.10.10.20(179)
> 10.10.10.30(179)} in state ESTABLISHED(6) received Notification Packet:
> Cease(6)
> [ 2007/12/05 13:31:19 INFO xorp_bgp BGP ] Peer-{10.10.10.20(179)
> 10.10.10.40(179)} in state ESTABLISHED(6) received Notification Packet:
> Cease(6)
> [ 2007/12/05 13:31:19 INFO xorp_bgp BGP ] Peer-{10.10.20.20(179)
> 10.10.20.30(179)} in state ESTABLISHED(6) received Notification Packet:
> Cease(6)
> [ 2007/12/05 13:31:34  FATAL xorp_bgp:10028 BGP +83 route_table_cache.cc
> add_route ] Internal fatal error: unreachable code reached
> [ 2007/12/05 13:31:34  ERROR xorp_rtrmgr:10024 RTRMGR +747
> module_manager.cc done_cb ] Command "/usr/local/xorp/bgp/xorp_bgp":
> terminated with signal 6.
> [ 2007/12/05 13:31:34  INFO xorp_rtrmgr:10024 RTRMGR +294
> module_manager.cc module_exited ] Module abnormally killed: bgp
> [ 2007/12/05 13:31:34 INFO xorp_rib RIB ] Received death event for
> protocol bgp shutting down -------
> OriginTable: ebgp
> EGP
> next table = Merged:(ebgp)+(ibgp)
> [ 2007/12/05 13:31:34 INFO xorp_rib RIB ] Received death event for
> protocol bgp shutting down -------
> OriginTable: ebgp
> EGP
> next table = Merged:(ebgp)+(ibgp)
> [ 2007/12/05 13:31:34 INFO xorp_rib RIB ] Received death event for
> protocol bgp shutting down -------
> OriginTable: ebgp
> EGP
> next table = Merged:(ebgp)+(ibgp)
> [ 2007/12/05 13:31:34 INFO xorp_rib RIB ] Received death event for
> protocol bgp shutting down -------
> OriginTable: ebgp
> EGP
> next table = Merged:(ebgp)+(ibgp)
>
>
>
> Here is the configuration:
>
> interfaces {
>     restore-original-config-on-shutdown: false
>
>     interface eth1 {
>         description: "router interface"
>         disable: false
>         default-system-config
>     }
>
>     interface eth2 {
>         description: "router interface"
>         disable: false
>         default-system-config
>     }
>
> }
>
> fea {
>     unicast-forwarding4 {
>         disable: false
>     }
> }
>
> policy {
>     policy-statement block {
>         term bgp_65400 {
>             from {
>                 protocol: "bgp"
>             }
>             then {
>                 accept
>             }
>         }
>     }
> }
>
> protocols {
>     bgp {
>         bgp-id: 10.100.100.2
>         local-as: 65000
>         peer 10.10.10.30 {
>             local-ip: 10.10.10.20
>             as: 65300
>             next-hop: 10.10.10.20
>         }
>         peer 10.10.10.40 {
>             local-ip: 10.10.10.20
>             as: 65400
>             next-hop: 10.10.10.20
>         }
>         peer 10.10.20.30 {
>             local-ip: 10.10.20.20
>             as: 65300
>             next-hop: 10.10.20.20
>         }
>         peer 10.10.20.40 {
>             local-ip: 10.10.20.20
>             as: 65400
>             next-hop: 10.10.20.20
>         }
>         export: "block"
>     }
> }
>
>
>
> An example of 'show bgp peers' and 'show bgp routes' from another test, just
> before a crash. All routes are marked as best.
>
> root at ipca> show bgp peers
> Peer 1: local 10.10.10.20/179 remote 10.10.10.30/179
> Peer 2: local 10.10.10.20/179 remote 10.10.10.40/179
> Peer 3: local 10.10.20.20/179 remote 10.10.20.30/179
> Peer 4: local 10.10.20.20/179 remote 10.10.20.40/179
> root at ipca> show bgp routes
> Status Codes: * valid route, > best route
> Origin Codes: i IGP, e EGP, ? incomplete
>
>    Prefix                Nexthop                    Peer            AS
> Path
>    ------                -------                    ----
> -------
> *> 2.0.0.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 1.0.0.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 2.0.0.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 1.0.0.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.1.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.2.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.3.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.4.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.5.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.6.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.7.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.8.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 1.0.9.0/24            10.10.10.30                10.100.100.13  65300
> i
> *> 2.0.1.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.2.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.3.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.4.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.5.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.6.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.7.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.8.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 2.0.9.0/24            10.10.20.40                10.100.100.24  65400
> i
> *> 1.0.1.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.2.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.3.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.4.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.5.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.6.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.7.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.8.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 1.0.9.0/24            10.10.20.30                10.100.100.23  65300
> i
> *> 2.0.1.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.2.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.3.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.4.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.5.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.6.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.7.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.8.0/24            10.10.10.40                10.100.100.14  65400
> i
> *> 2.0.9.0/24            10.10.10.40                10.100.100.14  65400
> i
>
>
>
> Regards,
> Arsi
>
> On Mon, Dec 03, 2007 at 05:43:49PM +0000, Mark Handley wrote:
>
> > Xorp should not crash; I don't think this is a known issue.  Can you
> > clarify - which Xorp process crashes?  The subject implies BGP, but I
> > just want to be sure.
> >
> > Also I'm not clear on the scenario - BGP doesn't advertise ASs to
> > interfaces - it advertises them via BGP connections which are only
> > loosely connected to interfaces (if you choose an interface IP address
> > for the connection endpoint).  Do you mean the BGP has peering
> > configured using the local IP addresses of the three ethernets in your
> > scenario?
> >
> > Which AS is the router that crashes in?
> >
> > Your text says 5 routers, but I'm not sure where the 5th one is - the
> > minimum needed to implement something like you describe is 4 (One each
> > for AS1, AS2, AS3 and the router that crashes).  Where's the 5th one?
> >
> > Also could you send the policy config you used to prevent route redistribution?
> >
> > If we understood the scenario, we can build a test suite to tickle
> > this problem, but right now I don't really know how to do this.
> >
> >  - Mark
> >
> > On Dec 3, 2007 10:21 AM, Arsi Antila <bbb999 at zerodistance.fi> wrote:
> > > Is the following a known problem in XORP?
> > >
> > > Note: this was shown to me by someone else. I didn't test this myself,
> > > so some of the details may be incorrect.
> > >
> > > XORP crashes when the same set of BGP routes is advertised from two
> > > different routers connected to the same interface and the winning route
> > > changes. Tested with VLANs, if-aliases and plain interfaces. Results do
> > > not vary.
> > >
> > >
> > > For example, configuration of the network is as follows:
> > >
> > > - device under test (Linux/Debian, XORP) and five simulated routers
> > >
> > > - AS 1 is advertised to ports eth1 and eth2
> > >
> > > - AS 2 is advertised to ports eth1 and eth2
> > >
> > > - AS 3 is advertised to port eth3
> > >
> > > - Policy rules so that AS 2 routes are not advertised to AS 1
> > >
> > > BGP process dies when one of the routers in AS 2 goes down and then up
> > > so that the primary route in AS 2 changes.
> > >
> > >
> > > Regards,
> > > A.A.
> > >
> > > _______________________________________________
> > > Xorp-users mailing list
> > > Xorp-users at xorp.org
> > > http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-users
> > >
>



More information about the Xorp-users mailing list