问题
Executive Summary
Is it standard behavior that shells skip over NUL bytes when doing process substitution?
For example, executing
printf '\0abc' | read value && echo $value
will yield abc
. The NUL value is skipped, even though the hexdump of the printf
output shows it's clearly being output.
My first thought was "word splitting". However, when using an actual process substitution
value=$(printf '\0abc')
the results are similar and =
does not perform word splitting.
Long Story
While searching for the proper answer for this question, I realized that at least three of the shell implementation (ash, zsh, and bash) I am reasonably familiar with will ignore a NUL character when reading the value from process substitution into a variable.
The exact point in the pipeline when this happens seems to be different, but the result is consistently that a NUL byte gets dropped as if it was never there in the first place.
I have checked with some of the implementations, and well, this seems to be normal behavior.
ash
will skip over '\0' on input, but it is not clear from the code if this is pure coincidence or intended behavior:
if (lastc != '\0') {
[...]
}
The bash
source code contains an explicit, albeit #ifdef'd warning telling us that it skipped a NUL value on process substitution:
#if 0
internal_warning ("read_comsub: ignored null byte in input");
#endif
I'm not so sure about zsh
's behaviour. It recognizes '\0'
as a meta character (as defined by the internal imeta()
function) and prepends a special Meta
surrogate character and sets bit #5 on the input character, essentially unmetaing it, which makes also makes '\0'
into a space ' '
)
if (imeta(c)) {
*ptr++ = Meta;
c ^= 32;
cnt++;
}
This seems to get stripped later because there is no evidence that value
in the above printf
command contains a meta character. Take this with a large helping of salt, since I'm not to familiar with zsh
's internals. Also note the side effect free statements.
Note that zsh
also allows you to include NUL (meta-escaped) in IFS
(making it possible to e.g. word-split find -print0
without xargs -0
). Thus printf '\0abc' | read value
and value=$(printf '\0abc')
should yield different results depending on the value of IFS
(read
does field splitting).
回答1:
All extant POSIX shells use C strings (NUL-terminated), not Pascal strings (carrying their length as separate metadata, thus able to contain NULs). Thus, they can't possibly contain NULs in string contents. This was notably true of the Bourne Shell and ksh, both major influences to the POSIX sh standard.
The specification allows shells to behave in an implementation-defined manner here; without knowing the specific shell and release being targeted, I would not expect a specific behavior between terminating the stream returned at the first NUL and simply discarding NULs altogether. Quoting:
The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more characters at the end of the substitution. Embedded characters before the end of the output shall not be removed; however, they may be treated as field delimiters and eliminated during field splitting, depending on the value of IFS and quoting that is in effect. If the output contains any null bytes, the behavior is unspecified.
This isn't to say you can't read and produce streams containing NULs in widely-available shells! See the below, using process substitution (written for bash, but should work with ksh or zsh with minor changes if any):
# read content from stdin into array variable and a scalar variable "suffix"
array=( )
while IFS= read -r -d '' line; do
array+=( "$line" )
done < <(process that generates NUL stream here)
suffix=$line # content after last NUL, if any
# emit recorded content
printf '%s\0' "${array[@]}"; printf '%s' "$suffix"
来源:https://stackoverflow.com/questions/32722007/is-skipping-ignoring-nul-bytes-on-process-substitution-standardized