[Xorp-users] Patch for making it harder to crash rib when removing interfaces.

Pavlin Radoslavov pavlin at icir.org
Thu Sep 27 14:04:52 PDT 2007


Ben Greear <greearb at candelatech.com> wrote:

> Attached is a patch that helps keep from referencing stale Vif pointers in
> routes when
> deleting interfaces.  This may only be trigerable when you have non-local
> routes (ie,
> those generated by OSPF).
> 
> There are a couple of other changes related to removing interfaces.  I still
> don't have
> this working, and these patches are not necessarily correct.  In particular,
> the rib changes
> probably should search all of the RIB tables for interfaces if it is not found
> in the unicast ipv4 table.
> Or, keep a separate list of Vifs outside of the routing table structures.
> 
> Please let me know what you think.

The patch seems to be around the introduction of a global RibManager
variable "glb_rib_manager" which is a big NO.
We try very hard to avoid any global variables, otherwise you lose
many (object-oriented) properties.
For example, you automatically lose the ability to generate 2+ RIB
instances within the same binary. 

However, the idea of saving the vif name itself instead of a pointer
to the Vif instance might be worth considering in case the route
class itself needs only the vif name.
I am not completely convinced it is the right solution in case, say,
some of the code toward the end of the RIB processing needs to look
into the status of the Vif before sending a route to the FEA, but
is a proposal that should be kept on the table.

Let me try to step back and look into the problem from high level.
If an interface/vif is deleted from the system, all protocols and
RIB will receive the "delete" action from the FEA.
I believe all protocols will try to withdraw the routes they sent to
RIB if those routes use the deleted interface.
The RIB itself will withdraw its own "connected" routes and will
remove the Vif instance itself.
Therefore, after the Vif instance deletion the "route delete"
requests from an unicast routing protocol might refer to the Vif
that doesn't exist. Also, the RIB stored routes from those protocols
might still be referring to the deleted Vif instance.

To solve the problem it needs to be decided who is responsible for
deleting the unicast routes that refer to the deleted vif. Currently
it is the protocol that added the routes, but this creates a race
condition.
If the RIB becomes responsible for deleting those routes, then we
need to change the route add/delete semantics between the RIB and
the routing protocols, because currently the protocols assume they
have complete control over the adding/deletion of their own routes.

One possible solution of breaking the race condition is to keep the
Vif instance around (e.g., marked as DELETED or in some different
container) until no RIB routes refer to it. This eventually will be
until all protocols finish deleting the routes they had installed
and were using this particular vif.

The trickiest part will be how to keep track of when the Vif is not
needed by any route, but techniques like reference counters or the
ref_ptr mechanism can be used to simplify it.
I believe this is a relatively clean solution that also doesn't
break existing properties/assumptions/semantics.

Regards,
Pavlin


> Thanks,
> Ben
> 
> -- 
> Ben Greear <greearb at candelatech.com> Candela Technologies Inc
> http://www.candelatech.com
> 
> 
> ? fea/data_plane/io/io_ip_socket.cc.ben
> ? rib/xorp_rib.bz2
> ? xrl/interfaces/fea_ifmgr_mirror_xif.cc.flc
> ? xrl/targets/fea_base.cc.flc
> ? xrl/targets/fea_base.hh.flc
> ? xrl/targets/fea_ifmgr_mirror_base.cc.flc
> Index: fea/iftree.cc
> ===================================================================
> RCS file: /cvs/xorp/fea/iftree.cc,v
> retrieving revision 1.51
> diff -u -r1.51 iftree.cc
> --- fea/iftree.cc	27 Sep 2007 00:33:33 -0000	1.51
> +++ fea/iftree.cc	27 Sep 2007 16:18:19 -0000
> @@ -1079,7 +1079,7 @@
>      IfTreeAddr4* ap = find_addr(addr);
>  
>      if (ap == NULL)
> -	return (XORP_ERROR);
> +	return XORP_OK; // Already deleted it seems... (XORP_ERROR);
>      ap->mark(DELETED);
>      return (XORP_OK);
>  }
> Index: fea/data_plane/control_socket/netlink_socket_utilities.cc
> ===================================================================
> RCS file: /cvs/xorp/fea/data_plane/control_socket/netlink_socket_utilities.cc,v
> retrieving revision 1.7
> diff -u -r1.7 netlink_socket_utilities.cc
> --- fea/data_plane/control_socket/netlink_socket_utilities.cc	27 Sep 2007 00:33:35 -0000	1.7
> +++ fea/data_plane/control_socket/netlink_socket_utilities.cc	27 Sep 2007 16:18:19 -0000
> @@ -411,7 +411,7 @@
>  		return (XORP_OK);	// No error
>  	    errno = -err->error;
>  	    last_errno = errno;
> -	    error_msg = c_format("AF_NETLINK NLMSG_ERROR message: %s",
> +	    error_msg = c_format("AF_NETLINK NLMSG_ERROR request message: %s",
>  				 strerror(errno));
>  	    return (XORP_ERROR);
>  	}
> Index: ospf/peer_manager.cc
> ===================================================================
> RCS file: /cvs/xorp/ospf/peer_manager.cc,v
> retrieving revision 1.145
> diff -u -r1.145 peer_manager.cc
> --- ospf/peer_manager.cc	24 Aug 2007 02:07:07 -0000	1.145
> +++ ospf/peer_manager.cc	27 Sep 2007 16:18:22 -0000
> @@ -369,9 +369,12 @@
>      debug_msg("Interface %s Vif %s\n", interface.c_str(), vif.c_str());
>      string concat = interface + "/" + vif;
>  
> -    if (0 != _pmap.count(concat))
> -	xorp_throw(BadPeer,
> -		   c_format("Mapping for %s already exists", concat.c_str()));
> +    if (0 != _pmap.count(concat)) {
> +	// Don't think we really need to error here, just return what we already have. --Ben
> +	//xorp_throw(BadPeer,
> +	//		   c_format("Mapping for %s already exists", concat.c_str()));
> +	return _pmap[concat];
> +    }
>  			    
>      OspfTypes::PeerID peerid = _next_peerid++;
>      _pmap[concat] = peerid;
> Index: ospf/xrl_io.cc
> ===================================================================
> RCS file: /cvs/xorp/ospf/xrl_io.cc,v
> retrieving revision 1.47
> diff -u -r1.47 xrl_io.cc
> --- ospf/xrl_io.cc	16 Aug 2007 01:05:51 -0000	1.47
> +++ ospf/xrl_io.cc	27 Sep 2007 16:18:23 -0000
> @@ -634,8 +634,17 @@
>      case BAD_ARGS:
>      case COMMAND_FAILED:
>      case INTERNAL_ERROR:
> -	XLOG_FATAL("Cannot join a multicast group on interface %s vif %s: %s",
> -		   interface.c_str(), vif.c_str(), xrl_error.str().c_str());
> +	if (strstr(xrl_error.str().c_str(), "Address already in use")) {
> +	    // Just a warning
> +	    XLOG_ERROR("Cannot join a multicast group on interface %s vif %s err_code: %i: %s",
> +		       interface.c_str(), vif.c_str(), (int)(xrl_error.error_code()),
> +		       xrl_error.str().c_str());
> +	}
> +	else {
> +	    XLOG_FATAL("Cannot join a multicast group on interface %s vif %s err_code: %i: %s",
> +		       interface.c_str(), vif.c_str(), (int)(xrl_error.error_code()),
> +		       xrl_error.str().c_str());
> +	}
>  	break;
>      }
>  }
> Index: rib/main_rib.cc
> ===================================================================
> RCS file: /cvs/xorp/rib/main_rib.cc,v
> retrieving revision 1.29
> diff -u -r1.29 main_rib.cc
> --- rib/main_rib.cc	16 Feb 2007 22:47:06 -0000	1.29
> +++ rib/main_rib.cc	27 Sep 2007 16:18:24 -0000
> @@ -26,6 +26,15 @@
>  #include <sysexits.h>
>  #endif
>  
> +RibManager* glb_rib_manager = NULL;
> +
> +Vif* __find_vif(const string& nm) {
> +    if (glb_rib_manager) {
> +	return glb_rib_manager->urib4().find_vif(nm);
> +    }
> +    return NULL;
> +}
> +
>  int
>  main (int /* argc */, char* argv[])
>  {
> @@ -50,13 +59,13 @@
>  	//
>  	// The RIB manager
>  	//
> -	RibManager rib_manager(eventloop, xrl_std_router_rib, "fea");
> -	rib_manager.enable();
> +	glb_rib_manager = new RibManager(eventloop, xrl_std_router_rib, "fea");
> +	glb_rib_manager->enable();
>  
>  	wait_until_xrl_router_is_ready(eventloop, xrl_std_router_rib);
>  
>  	// Add the FEA as a RIB client
> -	rib_manager.add_redist_xrl_output4("fea",	/* target_name */
> +	glb_rib_manager->add_redist_xrl_output4("fea",	/* target_name */
>  					   "all",	/* from_protocol */
>  					   true,	/* unicast */
>  					   false,	/* multicast */
> @@ -64,7 +73,7 @@
>  					   "all",	/* cookie */
>  					   true /* is_xrl_transaction_output */
>  	    );
> -	rib_manager.add_redist_xrl_output6("fea",	/* target_name */
> +	glb_rib_manager->add_redist_xrl_output6("fea",	/* target_name */
>  					   "all",	/* from_protocol */
>  					   true,	/* unicast */
>  					   false,	/* multicast */
> @@ -73,13 +82,13 @@
>  					   true /* is_xrl_transaction_output */
>  	    );
>  
> -	rib_manager.start();
> +	glb_rib_manager->start();
>  
>  	//
>  	// Main loop
>  	//
>  	string reason;
> -	while (rib_manager.status(reason) != PROC_DONE) {
> +	while (glb_rib_manager->status(reason) != PROC_DONE) {
>  	    eventloop.run();
>  	}
>      } catch (...) {
> Index: rib/redist_xrl.cc
> ===================================================================
> RCS file: /cvs/xorp/rib/redist_xrl.cc,v
> retrieving revision 1.31
> diff -u -r1.31 redist_xrl.cc
> --- rib/redist_xrl.cc	23 May 2007 12:12:46 -0000	1.31
> +++ rib/redist_xrl.cc	27 Sep 2007 16:18:26 -0000
> @@ -151,12 +151,19 @@
>      : RedistXrlTask<A>(parent),
>        _net(ipr.net()),
>        _nexthop(ipr.nexthop_addr()),
> -      _ifname(ipr.vif()->ifname()),
> -      _vifname(ipr.vif()->name()),
>        _metric(ipr.metric()),
>        _admin_distance(ipr.admin_distance()),
>        _protocol_origin(ipr.protocol().name())
>  {
> +    Vif* vif = ipr.findVif();
> +    if (vif) {
> +	_ifname = vif->ifname();
> +	_vifname = vif->name();
> +    }
> +    else {
> +	_vifname = ipr.vifName();
> +	// TODO:  Default _ifname to same??
> +    }
>  }
>  
>  template <>
> @@ -229,12 +236,18 @@
>      : RedistXrlTask<A>(parent),
>        _net(ipr.net()),
>        _nexthop(ipr.nexthop_addr()),
> -      _ifname(ipr.vif()->ifname()),
> -      _vifname(ipr.vif()->name()),
>        _metric(ipr.metric()),
>        _admin_distance(ipr.admin_distance()),
>        _protocol_origin(ipr.protocol().name())
>  {
> +    Vif* vif = ipr.findVif();
> +    if (vif) {
> +	_ifname = vif->ifname();
> +	_vifname = vif->name();
> +    }
> +    else {
> +	_vifname = ipr.vifName();
> +    }
>  }
>  
>  template <>
> Index: rib/rib.cc
> ===================================================================
> RCS file: /cvs/xorp/rib/rib.cc,v
> retrieving revision 1.67
> diff -u -r1.67 rib.cc
> --- rib/rib.cc	27 Sep 2007 00:33:38 -0000	1.67
> +++ rib/rib.cc	27 Sep 2007 16:18:26 -0000
> @@ -521,6 +529,18 @@
>      return XORP_OK;
>  }
>  
> +/** Return instance of specified VIF if we can find it, NULL otherwise.
> + */
> +template <typename A>
> +Vif*
> +RIB<A>::find_vif(const string& vifname) {
> +    map<string, Vif>::iterator vi = _vifs.find(vifname);
> +    if (vi == _vifs.end()) {
> +	return NULL;
> +    }
> +    return &(vi->second);
> +}
> +
>  template <typename A>
>  int
>  RIB<A>::delete_vif(const string& vifname)
> @@ -762,7 +782,7 @@
>  	const IPRouteEntry<A>* re = _final_table->lookup_route(nexthop_addr);
>  	if (re != NULL) {
>  	    // We found a route for the nexthop
> -	    vif = re->vif();
> +	    vif = re->findVif();
>  	    if ((vif != NULL)
>  		&& (vif->is_underlying_vif_up())
>  		&& (vif->is_same_subnet(IPvXNet(re->net()))
> @@ -895,7 +915,7 @@
>      // 1. Check for an expected route miss.
>      // 2. Check table entry validity and existence.
>      re = _final_table->lookup_route(lookup_addr);
> -    if (re == NULL || re->vif() == NULL) {
> +    if (re == NULL || re->vifName().empty() || re->findVif() == NULL) {
>  	if (matchtype == RibVerifyType(MISS)) {
>  	    debug_msg("****ROUTE MISS SUCCESSFULLY VERIFIED****\n");
>  	    return XORP_OK;
> @@ -951,14 +971,14 @@
>  		  nexthop_addr.str().c_str(),
>  		  route_nexthop->addr().str().c_str());
>      }
> -    if (ifname != re->vif()->name()) {
> +    if (ifname != re->vifName()) {
>  	XLOG_ERROR("Interface \"%s\" does not match expected \"%s\".",
> -		   re->vif()->str().c_str(), ifname.c_str());
> +		   re->vifName().c_str(), ifname.c_str());
>  	return XORP_ERROR;
>      } else {
>  	debug_msg("Ifname: Exp: %s == Got: %s\n",
>  		  ifname.c_str(),
> -		  re->vif()->name().c_str());
> +		  re->vifName().c_str());
>      }
>      if (metric != re->metric()) {
>  	XLOG_ERROR("Metric \"%u\" does not match expected \"%u\".",
> @@ -982,7 +1002,7 @@
>      const IPRouteEntry<A>* re = _final_table->lookup_route(lookupaddr);
>      // Case 1: Route miss. Return the null IP address.
>      // Vif cannot be NULL for a valid route.
> -    if (re == NULL || re->vif() == NULL)
> +    if (re == NULL || re->vifName().empty() || re->findVif() == NULL)
>  	return A::ZERO();
>  
>  #ifdef notyet
> Index: rib/rib.hh
> ===================================================================
> RCS file: /cvs/xorp/rib/rib.hh,v
> retrieving revision 1.40
> diff -u -r1.40 rib.hh
> --- rib/rib.hh	27 Sep 2007 00:33:39 -0000	1.40
> +++ rib/rib.hh	27 Sep 2007 16:18:27 -0000
> @@ -175,6 +175,10 @@
>       */
>      virtual int new_vif(const string& vifname, const Vif& vif);
>  
> +    /** Return instance of specified VIF if we can find it, NULL otherwise.
> +     */
> +    virtual Vif* find_vif(const string& vifname);
> +
>      /**
>       * Inform the RIB that a VIF no longer exists.
>       *
> Index: rib/rib_manager.hh
> ===================================================================
> RCS file: /cvs/xorp/rib/rib_manager.hh,v
> retrieving revision 1.36
> diff -u -r1.36 rib_manager.hh
> --- rib/rib_manager.hh	16 Feb 2007 22:47:08 -0000	1.36
> +++ rib/rib_manager.hh	27 Sep 2007 16:18:27 -0000
> @@ -448,6 +448,8 @@
>       */
>      XrlRibTarget& xrl_rib_target() { return _xrl_rib_target; }
>  
> +    VifManager& getVifManager() { return _vif_manager; }
> +
>  private:
>      ProcessStatus       _status_code;
>      string              _status_reason;
> Index: rib/route.hh
> ===================================================================
> RCS file: /cvs/xorp/rib/route.hh,v
> retrieving revision 1.23
> diff -u -r1.23 route.hh
> --- rib/route.hh	23 May 2007 12:12:47 -0000	1.23
> +++ rib/route.hh	27 Sep 2007 16:18:27 -0000
> @@ -31,6 +31,7 @@
>  
>  #include "protocol.hh"
>  
> +Vif* __find_vif(const string& nm);
>  
>  /**
>   * @short Base class for RIB routing table entries.
> @@ -52,7 +53,7 @@
>       */
>      RouteEntry(Vif* vif, NextHop* nexthop, const Protocol& protocol,
>  	       uint16_t metric)
> -	: _vif(vif), _nexthop(nexthop), _protocol(protocol),
> +	    : _vifname(vif ? vif->name().c_str() : ""), _nexthop(nexthop), _protocol(protocol),
>  	  _admin_distance(UNKNOWN_ADMIN_DISTANCE), _metric(metric)
>      {
>      }
> @@ -69,7 +70,11 @@
>       * @return the Virtual Interface on which packets matching this
>       * routing table entry should be forwarded.
>       */
> -    Vif* vif() const { return _vif; }
> +    Vif* findVif() const {
> +	return __find_vif(_vifname);
> +    }
> +
> +    const string& vifName() const { return _vifname; }
>  
>      /**
>       * Get the NextHop router.
> @@ -132,7 +137,8 @@
>      uint16_t metric() const { return _metric; }
>  
>  protected:
> -    Vif*		_vif;
> +    //Vif*		_vif;
> +    string _vifname;
>      NextHop*		_nexthop;		// The next-hop router
>      // The routing protocol that instantiated this route
>      const Protocol&	_protocol;
> @@ -238,8 +244,8 @@
>       */
>      string str() const {
>  	string dst = (_net.is_valid()) ? _net.str() : string("NULL");
> -	string vif = (_vif) ? string(_vif->name()) : string("NULL");
> -	return string("Dst: ") + dst + string(" Vif: ") + vif +
> +	//string vif = (_vif) ? string(_vif->name()) : string("NULL");
> +	return string("Dst: ") + dst + string(" Vif: ") + _vifname +
>  	    string(" NextHop: ") + _nexthop->str() +
>  	    string(" Metric: ") + c_format("%d", _metric) +
>  	    string(" Protocol: ") + _protocol.name() +
> Index: rib/rt_tab_extint.cc
> ===================================================================
> RCS file: /cvs/xorp/rib/rt_tab_extint.cc,v
> retrieving revision 1.32
> diff -u -r1.32 rt_tab_extint.cc
> --- rib/rt_tab_extint.cc	27 Sep 2007 00:33:39 -0000	1.32
> +++ rib/rt_tab_extint.cc	27 Sep 2007 16:18:28 -0000
> @@ -99,7 +99,7 @@
>  	    const IPRouteEntry<A>* nexthop_route;
>  	    nexthop_route = lookup_route_in_igp_parent(nexthop_addr);
>  	    if (nexthop_route != NULL) {
> -		Vif* vif = nexthop_route->vif();
> +		Vif* vif = nexthop_route->findVif();
>  		if ((vif != NULL)
>  		    && (vif->is_same_subnet(IPvXNet(nexthop_route->net()))
>  			|| vif->is_same_p2p(IPvX(nexthop_addr)))) {
> @@ -160,7 +160,7 @@
>  		    this->next_table()->delete_route(found, this);
>  	    }
>  
> -	    Vif* vif = nexthop_route->vif();
> +	    Vif* vif = nexthop_route->findVif();
>  	    if ((vif != NULL)
>  		&& (vif->is_same_subnet(IPvXNet(nexthop_route->net()))
>  		    || vif->is_same_p2p(IPvX(nexthop_addr)))) {
> @@ -200,7 +200,7 @@
>  {
>      ResolvedIPRouteEntry<A>* resolved_route;
>      resolved_route = new ResolvedIPRouteEntry<A>(route.net(),
> -						 nexthop_route->vif(),
> +						 nexthop_route->findVif(),
>  						 nexthop_route->nexthop(),
>  						 route.protocol(),
>  						 route.metric(),
> @@ -350,6 +350,7 @@
>      return XORP_OK;
>  }
>  
> +
>  template<class A>
>  const ResolvedIPRouteEntry<A>*
>  ExtIntTable<A>::lookup_in_resolved_table(const IPNet<A>& net)
> _______________________________________________
> Xorp-users mailing list
> Xorp-users at xorp.org
> http://mailman.ICSI.Berkeley.EDU/mailman/listinfo/xorp-users



More information about the Xorp-users mailing list