[Xorp-users] Patch for making it harder to crash rib when removing interfaces.

Ben Greear greearb at candelatech.com
Thu Sep 27 14:35:10 PDT 2007


Pavlin Radoslavov wrote:

> The patch seems to be around the introduction of a global RibManager
> variable "glb_rib_manager" which is a big NO.
> We try very hard to avoid any global variables, otherwise you lose
> many (object-oriented) properties.
> For example, you automatically lose the ability to generate 2+ RIB
> instances within the same binary. 

Well, if we go with a lookup scheme, the route objects need to be
able to look up a vif by name.  That implies a global list of vifs
somewhere (could be in a singleton, which allows global variables
wrapped up in something that looks OO, but it's logically the same).

With my hack, you can have only one RibManager at a time, but that
did not seem to be a problem.  You can have all the RIBs you
want...

> However, the idea of saving the vif name itself instead of a pointer
> to the Vif instance might be worth considering in case the route
> class itself needs only the vif name.
> I am not completely convinced it is the right solution in case, say,
> some of the code toward the end of the RIB processing needs to look
> into the status of the Vif before sending a route to the FEA, but
> is a proposal that should be kept on the table.

It can do a lookup at this time and find the one & only instance of
the Vif object if it currently exists.  If not, it can take evasive
action.

> 
> Let me try to step back and look into the problem from high level.
> If an interface/vif is deleted from the system, all protocols and
> RIB will receive the "delete" action from the FEA.
> I believe all protocols will try to withdraw the routes they sent to
> RIB if those routes use the deleted interface.
> The RIB itself will withdraw its own "connected" routes and will
> remove the Vif instance itself.
> Therefore, after the Vif instance deletion the "route delete"
> requests from an unicast routing protocol might refer to the Vif
> that doesn't exist. Also, the RIB stored routes from those protocols
> might still be referring to the deleted Vif instance.

This appears to be my crash...a route from OSPF had a stale
ref to a Vif.  It was trying to delete the route when it crashed
accessing the vif.

> 
> To solve the problem it needs to be decided who is responsible for
> deleting the unicast routes that refer to the deleted vif. Currently
> it is the protocol that added the routes, but this creates a race
> condition.
> If the RIB becomes responsible for deleting those routes, then we
> need to change the route add/delete semantics between the RIB and
> the routing protocols, because currently the protocols assume they
> have complete control over the adding/deletion of their own routes.
> 
> One possible solution of breaking the race condition is to keep the
> Vif instance around (e.g., marked as DELETED or in some different
> container) until no RIB routes refer to it. This eventually will be
> until all protocols finish deleting the routes they had installed
> and were using this particular vif.

This sounds promising, and saves the lookup of the Vif.  We would have
to check all accesses of the vif pointer to make sure the vif has not
been logically deleted before taking actions on it.

There may be a related problem where we don't un-register multicast
addresses on deleted vifs as well.  This doesn't appear to cause crashes,
just keeps things from working.

If you get a patch of this nature, I'd be happy to test it.

Thanks,
Ben

-- 
Ben Greear <greearb at candelatech.com>
Candela Technologies Inc  http://www.candelatech.com



More information about the Xorp-users mailing list