[Xorp-hackers] potential race condition in run_command.cc?
Michael Larson
mike at vyatta.com
Thu Jul 20 09:32:35 PDT 2006
Hello all,
We've been occasionally seeing a hang of the rtrmgr on startup. In tracing the cause it looks as if there is a race condition in how the run_command.cc executes the popen2() method.
Specifically,
libxorp/run_command.cc (snippet from RunCommandBase::execute()):
_pid = popen2(_command, _argument_list, _stdout_stream, _stderr_stream,
redirect_stderr_to_stdout());
// XLOG_TRACE(true, "RunCommandBase::execute() Executing program: 6");
if (_stdout_stream == NULL) {
XLOG_ERROR("Failed to execute command \"%s\"", final_command.c_str());
cleanup();
_exec_id.restore_saved_exec_id(error_msg);
return (XORP_ERROR);
}
// Insert the new process to the map of processes
XLOG_ASSERT(pid2command.find(_pid) == pid2command.end());
pid2command[_pid] = this;
The std::map pid2command inserts the pid after executing popen2(). Inside of popen2() the process is forked and the _command is executed in the child process.
When the process has completed it is expected to signal the parent process through the method child_handler() (in run_command.cc):
child_handler(int signo)
{
XLOG_ASSERT(signo == SIGCHLD);
//
// XXX: Wait for any child process.
// If we are executing any child process outside of the RunProcess
// mechanism, then the waitpid() here may create a wait() blocking
// problem for those processes. If this ever becomes an issue, then
// we should try non-blocking waitpid() for each pid in the
// pid2command map.
//
do {
pid_t pid = 0;
int wait_status = 0;
map<pid_t, RunCommandBase *>::iterator iter;
pid = waitpid(-1, &wait_status, WUNTRACED | WNOHANG);
debug_msg("pid=%d, wait status=%d\n", XORP_INT_CAST(pid), wait_status);
if (pid <= 0)
return; // XXX: no more child processes
XLOG_ASSERT(pid > 0);
popen2_mark_as_closed(pid, wait_status);
iter = pid2command.find(pid);
if (iter == pid2command.end()) {
However, if the pid is not found in the map pid2command the child_handler() continues to loop and wait for the child process to complete (and signal its completion via waitpid).
So, the problem is that on commands that immediately return/complete the child_handler() before the pid2command map is updated, the child_handler() will be called before the pid2command has the new pid and the command will never complete.
Our version of xorp has more program statements in the template files therefore it seems as if it is a bit easier for us to stumble on this condition. Finally, I have a fix for this condition, but it is more of a patch than an elegant fix. I think the fix should be something along the lines where the pid2command needs to be updated before the exec() call is made in popen.
Mike
Vyatta
More information about the Xorp-hackers
mailing list