Why should we check WIFEXITED after wait in order to kill child processes in Linux system call?

问题

I came across some code in C where we check the return value of wait and if it's not an error there's yet another check of WIFEXITED and WIFEXITSTATUS. Why isn't this redundant? As far as I understand wait returns -1 if an error occurred while WIFEXITED returns non-zero value if wait child terminated normally. So if there wasn't any error in this line if ( wait(&status) < 0 ) why would anything go wrong durng WIFEXITED check?

This is the code:

#include <stdio.h>
#include <signal.h>
#include <sys/wait.h>
#include <stdlib.h>
#include <unistd.h>

#define CHILDREN_NUM 5

int main () {

    int i, status, pid, p;
    for(i = 0; (( pid = fork() ) < 0) && i < CHILDREN_NUM;i++)
        sleep(5);


    if ( pid == 0 )
    {
        printf(" Child %d : successfully created!\n",i);
        exit( 0 );  /* Son normally exits here! */
    }

    p = CHILDREN_NUM;
    /* The father waits for agents to return succesfully */
    while ( p >= 1 )
    {
        if ( wait(&status) < 0 ) {
            perror("Error");
            exit(1);
        }

        if ( ! (WIFEXITED(status) && (WEXITSTATUS(status) == 0)) )  /* kill all running agents */
        {
            fprintf( stderr,"Child failed. Killing all running children.\n");
           //some code to kill children here
            exit(1);
        }
        p--;
    }

    return(0);
}

回答1:

wait returning >= 0 tells you a child process has terminated (and that calling wait didn't fail), but it does not tell you whether that process terminated successfully or not (or if it was signalled).

But, here, looking at your code, it's fairly obvious the program does care about whether the child process that terminated did so successfully or not:

fprintf( stderr,"Child failed. Killing all running children.\n");

So, the program needs to do further tests on the status structure that was populated by wait:

WIFEXITED(status): did the process exit normally? (as opposed to being signalled).
WEXITSTATUS(status) == 0: did the process exit with exit code 0 (aka "success"). For more information, see: Meaning of exit status 1 returned by linux command.

回答2:

wait(&status) waits on the termination of a child process. The termination may be due to a voluntary exit or due to the receipt of an unhandled signal whose default disposition is to terminate the process.

WIFEXITED(status) and WIFSIGNALED(status) allow you to distinguish the two* cases, and you can later use either WEXITSTATUS or WTERMSIG to retrieve either the exit status (if(WIFEXITED(status)) or termination signal (if(WIFSIGNALED(status)).

*with waitpid and special flags (WUNTRACED, WCONTINUED), you can also wait on child process stops and resumptions, which you can detect with WIFSTOPPED or WIFCONTINUED (linux only) respectively. See waitpid(2) for more information.

来源：https://stackoverflow.com/questions/47441871/why-should-we-check-wifexited-after-wait-in-order-to-kill-child-processes-in-lin

标签

Linux

unix

operating-system