[Xorp-hackers] process auto restart

Tue, 10 Aug 2004 10:32:09 -0700

On Mon, 09 Aug 2004 20:33:11 +0100, Mark Handley <m.handley@cs.ucl.ac.uk> wrote:
> 
> >In other words, the rtrmgr has all the responsibility for restarting
> >and configuring the process, and the process itself does not need to
> >do anything special.
> >
> >Note that currently the rtrmgr only detects (using 1(a)) that a
> >process has exit abnormally, but it does not attempt to restart it.
> >The restarting should be added in one of the future releases.
> 
> This is correct.  There's one other question that would need to be
> resolved before this can be implemented.  What should the rtrmgr do
> about processes that depend on the failed process?  The rtrmgr knows
> about process startup dependencies from the dependency information in
> the process template files.  But should it use this information?
> 
> My feeling is no.  Our error handling rules state that a process
> should terminate itself if a process it critically depends on fails.
> So in principle, the rtrmgr shouldn't do anything based on this
> dependency information - one process failure should cause all the
> critically dependent processes to terminate, and the rtrmgr should
> wait until all the processes that think they are dependent have
> terminated, and then restart everything that exited.  But how does the
> rtrmgr know when it's safe to start the restart procedure?
> 
> This isn't a showstopper, but needs to be decided globally, or we'll
> have inconsistent behaviour.
> 
>  - Mark
> 

It might be a good idea to define some processes as server components
(e.g. RIB) and others as clients (e.g. BGP).  If a server component
dies, one easy thing to do is to simply restart the whole stack.  If a
client dies, it can be restarted immediatelly.  Too complicated
dependencies might be too hard to deal with.

Just my 2 cents.

--Ray