Do 'cat foo.txt | my_cmd' and 'my_cmd < foo.txt' accomplish the same thing?

问题

This question helped me understand the difference between redirection and piping, but the examples focus on redirecting STDOUT (echo foo > bar.txt) and piping STDIN (ls | grep foo).

It would seem to me that any command that could be written my_command < file.txt could also be written cat file.txt | my_command. In what situations are STDIN redirection necessary?

Apart from the fact that using cat spawns an extra process and is less efficient than redirecting STDIN, are there situations in which you have to use the STDIN redirection? Put another way, is there ever a reason to pipe the output of cat to another command?

回答1:

What's the difference between my_command < file.txt and cat file.txt | my_command?

my_command < file.txt

The redirection symbol can also be written as 0< as this redirects file descriptor 0 (stdin) to connect to file.txt instead of the current setting, which is probably the terminal. If my_command is a shell built-in then there are NO child processes created, otherwise there is one.

cat file.txt | my_command

This redirects file descriptor 1 (stdout) of the command on the left to the input stream of an anonymous pipe, and file descriptor 0 (stdin) of the command on the right to the output stream of the anonymous pipe.

We see at once that there is a child process, since cat is not a shell built-in. However in bash even if my_command is a shell builtin it is still run in a child process. Therefore we have TWO child processes.

So the pipe, in theory, is less efficient. Whether that difference is significant depends on many factors, including the definition of "significant". The time when a pipe is preferable is this alternative:

command1 > file.txt
command2 < file.txt

Here it is likely that

command1 | command2

is more efficient, remembering that, in practice, we will probably need a third child process in rm file.txt.

However, there are limitations to pipes. They are not seekable (random access, see man 2 lseek) and they cannot be memory mapped (see man 2 mmap). Some applications map files to virtual memory, but it would be unusual to do that to stdin or stdout. Memory mapping in particular is not possible on a pipe (whether anonymous or named) because a range of virtual addresses has to be reserved and for that a size is required.

Edit:

As mentioned by @JohnKugelman, a common error and source of many SO questions is the associated issue with a child process and redirection:

Take a file file.txt with 99 lines:

i=0
cat file.txt|while read
do
   (( i = i+1 ))
done

echo "$i"

What gets displayed? The answer is 0. Why? Because the count i = i + 1 is done in a subshell which, in bash, is a child process and does not change i in the parent (note: this does not apply to korn shell, ksh).

while read
do
   (( i = i+1 ))
done < file.txt

echo "$i"

This displays the correct count because no child processes are involved.

回答2:

You can of course replace any use of input redirection with a pipe that reads from cat, but it is inefficient to do so, as you are spawning a new process to do something the shell can already do by itself. However, not every instance of cat ... | my_command can be replaced with my_command < ..., namely when cat is doing its intended job of concatenating two (or more) files, it is perfectly reasonable to pipe its output to another command.

cat file1.txt file2.txt | my_command

来源：https://stackoverflow.com/questions/48446896/do-cat-foo-txt-my-cmd-and-my-cmd-foo-txt-accomplish-the-same-thing

标签

Linux

bash

redirect

pipe

stdin