[Xorp-hackers] BSR Restart

Samuel Lucas Vaz de Mello samuellucas at datacom.ind.br
Wed Oct 29 06:50:22 PDT 2008


>> Another issue: If the node is Elected as BSR and I add another
>> Cand-RP, it triggers bsr_stop() and bsr_start() what causes it to
>> lose Elected state and return to Pending. Although this is not
>> wrong, it would be better to just keep the state (the stop()
>> causes Cand-RP-ADV with zero holdtime, what changes the state also
>> in remote peers). I was wondering about writing an update method
>> that apply the actions in stop() and start() only in the changed
>> zones. What do you think?
> 
> Accidentally, couple of days ago I was thinking about the same when I
> noticed the bsr_stop()/bsr_start().
> 
> If I remember correctly, I used bsr_stop()/bsr_start() because there
> was lots of complexity when I tried to do the incremental update.
> Though, I don't remember whether this applied to the first version
> of the BSR implementation or to the (almost complete) rewrite I had
> to do after I discovered major issues with the first version.

The code in CVS is the rewritten code?


> Another reason for the bsr_stop()/bsr_start() was because I wanted
> atomic update (e.g., transaction like), otherwise during
> reconfiguration there will be lots of flux with potentially
> dangerous results (remember that consistent Cand-RP set across
> all PIM-SM routers is critical).
> At that time there was no easy way to apply some transaction-based
> mechanism to achieve the atomic update, so the
> bsr_stop()/bsr_start() was the simple work-around (e.g., note that
> it is in the etc/templates/pimsm*.tp template files).
> 
> If you can find a simple way to do the update atomically without
> bsr_stop()/bsr_start(), then give it a try and see how well it
> works.
> Otherwise, please submit a Bugzilla entry about the issue.

I played around a bit with this. 

I've tried to keep as close as possible to the stop()/start() approach. 
My idea was:

- Add a boolean parameter to bsr_start()/bsr_stop that is true during restarts
- Create a bsr_restart() method that calls bsr_stop(true) and then bsr_start(true)
- In stop(true), instead deleting the whole _active_bsr_zone_list delete only the non-elected zones. For the elected zones, delete all groupprefix (which contains the rps).
- In start(true) it will reuse the existing elected zone and add all configured groupprefix (and rps)
- In start(true), check if any elected zone was deleted from config and vanish them.
- After that, I put the elected zones back in the PENDING state and expire the bsr timer. The timer will put it back in the ELECTED state, compute the rp-set and send BSR Messages with the new rp-set.

In my tests it seems to work fine.
I'm sending the patch attached.

Can you have a look on it and check if I've forgotten something?


Regards,

 - Samuel


-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-Support-BSR-restart-without-losing-BSR-Elected-state.patch
Type: text/x-diff
Size: 17744 bytes
Desc: not available
Url : http://mailman.ICSI.Berkeley.EDU/pipermail/xorp-hackers/attachments/20081029/57932e3f/attachment.bin 


More information about the Xorp-hackers mailing list