问题
I'm working on a C shell and am having trouble with getting an arbitrary amount of pipes to work. When I run the shell, it hangs on any piping. For some reason, when I do ls -la | sort
, it hangs on the sort until I enter stuff and hit Ctrl+D. I know it has something to do with a pipe not closing, but the print statements show that pipes 3,4,5 all get closed in both the parent and child. I've been at this for a few hours and don't know why this doesn't work. Any help would be much appreciated.
Original Code:
char *current_command;
current_command = strtok_r(cmdline_copy, "|", &cmdline_copy);
char *commands[100][MAX_ARGS]; //Max 100 piped commands with each having MAX_ARGS arguments
int i = 0;
while (current_command != NULL) { //Go through each command and add it to the array
char *copy = malloc(strlen(current_command)*sizeof(char)); //Copy of curretn command
strcpy(copy, current_command);
char *args_t[MAX_ARGS];
int nargs_t = get_args(copy, args_t);
memcpy(commands[i], args_t, sizeof(args_t)*nargs_t); //Copy the command and it's arguments to the 2d array
i++;
current_command = strtok_r(NULL, "|\n", &cmdline_copy); //Use reentrant version of strtok to prevent fighting with get_args function
}
int fd[2*(i-1)]; //Set up the pipes i.e fd[0,1] is first pipe, fd[1,2] second pipe, etc.
for (int j = 0; j < i*2; j+=2) {
pipe(fd+j);
}
//Here is where we do the commands
for (int j = 0; j < i; j++) {
pid = fork(); //Fork
if (pid == 0) { //Child process
if (j == 0) { //First process
printf("Child Closed %d\n", fd[0]);
close(fd[0]);
dup2(fd[1], fileno(stdout));
}
else if (j == i -1) { //Last process
dup2(fd[j], fileno(stdin));
printf("Child closed %d\n", fd[j]);
printf("Child closed %d\n", fd[j+1]);
close(fd[j+1]);
close(fd[j]);
}
else { //Middle processes
dup2(fd[j], fileno(stdin));
dup2(fd[j+1], fileno(stdout));
printf("Child closed %d\n", fd[j]);
close(fd[j]);
}
execvp(commands[j][0], commands[j]);
}
else if (pid > 0) { //Parent
printf("Parent closed %d\n", fd[j]);
close(fd[j]);
printf("Parent closed %d\n", fd[j+1]);
close(fd[j+1]);
waitpid(pid, NULL, 0); //Wait for the process
}
else {
perror("Error with fork");
exit(1);
}
}
Final Code:
char *current_command;
current_command = strtok_r(cmdline_copy, "|", &cmdline_copy);
char *commands[100][MAX_ARGS]; //Max 100 piped commands with each having MAX_ARGS arguments
int command_count = 0;
while (current_command != NULL) { //Go through each command and add it to the array
char *copy = malloc(strlen(current_command)*sizeof(char)); //Copy of curretn command because get_args uses strtok
strcpy(copy, current_command);
char *args_t[MAX_ARGS];
int nargs_t = get_args(copy, args_t);
memcpy(commands[command_count], args_t, sizeof(args_t)*nargs_t); //Copy the command and it's arguments to the 2d array
command_count++;
current_command = strtok_r(NULL, "|\n", &cmdline_copy); //Use reentrant version of strtok to prevent fighting with get_args function
}
int fd[command_count*2-1];
pid_t pids[command_count];
for (int j = 0; j < command_count*2; j+=2) { //Open up a pair of pipes for every command
pipe(fd+j);
}
for (int j = 0; j < command_count; j++) {
pids[j] = fork();
if (pids[j] == 0) { //Child process
if (j == 0) { //Duplicate only stdout pipe for first pipe
dup2(fd[1], fileno(stdout));
}
else if (j == (command_count-1)) { //Duplicate only stdin for last pipe
up2(fd[2*(command_count-1)-2], fileno(stdin));
}
else { //Duplicate both stdin and stdout
dup2(fd[2*(j-1)], fileno(stdin));
dup2(fd[2*j+1], fileno(stdout));
}
for (int k = 0; k < j*2; k++) { //Close all fds
close(fd[k]);
}
execvp(commands[j][0], commands[j]); //Exec the command
}
else if (pids[j] < 0) {
perror("Error forking");
}
}
for (int k = 0; k < command_count*2; k++) { //Parent closes all fds
close(fd[k]);
}
waitpid(pids[command_count-1], NULL, 0); //Wait for only the last process;
回答1:
You aren't closing enough file descriptors in the children (or, in this case, in the parent).
Rule of thumb: If you dup2() one end of a pipe to standard input or standard output, close both of the original file descriptors returned by pipe() as soon as possible. In particular, you should close them before using any of the exec*() family of functions.
The rule also applies if you duplicate the descriptors with either
dup()
or
fcntl()
with F_DUPFD
In your code, you create all the pipes before you fork any children; therefore, each child needs to close all the pipe file descriptors after duplicating the one or two that it is going to use for input or output.
The parent process must also close all the pipe descriptors.
Also, the parent should not wait for children to complete until after launching all the children. In general, children will block with full pipe buffers if you make them run sequentially. You also defeat the benefits of parallelism. Note, however, that the parent must keep the pipes open until it has launched all the children — it must not close them after it launches each child.
For your code, the outline operation should be:
- Create N pipes
- For each of N (or N+1) children:
- Fork.
- Child duplicates standard input and output pipes
- Child closes all of the pipe file descriptors
- Child executes process (and reports error and exits if it fails)
- Parent records child PID.
- Parent goes on to next iteration; no waiting, no closing.
- Parent now closes N pipes.
- Parent now waits for the appropriate children to die.
There are other ways of organizing this, of greater or lesser complexity. The alternatives typically avoid opening all the pipes up front, which reduces the number of pipes to be closed.
'Appropriate children' means there are various ways of deciding when a pipeline (sequence of commands connected by pipes) is 'done'.
- One option is to wait for the last command in the sequence to exit. This has advantages — and is the traditional way to do it. Another advantage is that the parent process can launch the last child; the child can launch its predecessor in the pipeline, back to the first process in the pipeline. In this scenario, the parent never creates a pipe, so it doesn't have to close any pipes. It also only has one child to wait for; the other processes in the pipeline are descendents of the one child.
- Another option is to wait for all the processes to die(1). This is more or less what Bash does. This allows Bash to know the exit status of each element of the pipeline; the alternative does not permit that — which is relevant to
set -o pipefail
and thePIPEFAIL
array.
Can you help me understand why the
dup2
statement for the middle pipes isdup2(fd[(2*j)+1], fileno(stdout))
anddup2(fd[2*(j-1)], fileno(stdin))
? I got it off Google and it works, but I'm unsure why.
fileno(stdout)
is1
.fileno(stdin)
is0
.- The read end of a pipe is file descriptor 0 (analogous to standard input).
- The write end of a pipe is file descriptor 1 (analogous to standard output).
- You have an array
int fd[2*N];
for some value of N > 1, and you get a pair of file descriptors for each pipe. - For an integer
k
,fd[k*2+0]
is the read descriptor of a pipe, andfd[k*2+1]
is the read descriptor. - When
j
is neither 0 nor (N-1), you want it to read from the previous pipe and to write to its pipe:fd[(2*j)+1]
is the write descriptor of pipej
— which gets connected tostdout
.fd[2*(j-1)]
is the read descriptor of pipej-1
— which gets connected tostdin
.
- So, the two
dup2()
calls connect the the correct pipe file descriptors to standard input and standard output of processj
in the pipeline.
(1) There can be obscure scenarios where this leaves the parent hung indefinitely. I emphasize obscure; it requires something like a process that hangs around as a daemon without forking.
来源:https://stackoverflow.com/questions/49933706/c-shell-hanging-when-dealing-with-piping