Is skipping/ignoring NUL bytes on process substitution standardized?

余生长醉 提交于 2020-01-25 01:48:18

问题


Executive Summary

Is it standard behavior that shells skip over NUL bytes when doing process substitution?

For example, executing

printf '\0abc' | read value && echo $value

will yield abc. The NUL value is skipped, even though the hexdump of the printf output shows it's clearly being output.

My first thought was "word splitting". However, when using an actual process substitution

value=$(printf '\0abc')

the results are similar and = does not perform word splitting.

Long Story

While searching for the proper answer for this question, I realized that at least three of the shell implementation (ash, zsh, and bash) I am reasonably familiar with will ignore a NUL character when reading the value from process substitution into a variable.

The exact point in the pipeline when this happens seems to be different, but the result is consistently that a NUL byte gets dropped as if it was never there in the first place.

I have checked with some of the implementations, and well, this seems to be normal behavior.

ash will skip over '\0' on input, but it is not clear from the code if this is pure coincidence or intended behavior:

if (lastc != '\0') {
    [...]
}

The bash source code contains an explicit, albeit #ifdef'd warning telling us that it skipped a NUL value on process substitution:

#if 0
      internal_warning ("read_comsub: ignored null byte in input");
#endif

I'm not so sure about zsh's behaviour. It recognizes '\0'as a meta character (as defined by the internal imeta() function) and prepends a special Meta surrogate character and sets bit #5 on the input character, essentially unmetaing it, which makes also makes '\0' into a space ' ')

if (imeta(c)) {
    *ptr++ = Meta;
    c ^= 32;
    cnt++;
}

This seems to get stripped later because there is no evidence that value in the above printf command contains a meta character. Take this with a large helping of salt, since I'm not to familiar with zsh's internals. Also note the side effect free statements.

Note that zsh also allows you to include NUL (meta-escaped) in IFS (making it possible to e.g. word-split find -print0 without xargs -0). Thus printf '\0abc' | read value and value=$(printf '\0abc') should yield different results depending on the value of IFS (read does field splitting).


回答1:


All extant POSIX shells use C strings (NUL-terminated), not Pascal strings (carrying their length as separate metadata, thus able to contain NULs). Thus, they can't possibly contain NULs in string contents. This was notably true of the Bourne Shell and ksh, both major influences to the POSIX sh standard.

The specification allows shells to behave in an implementation-defined manner here; without knowing the specific shell and release being targeted, I would not expect a specific behavior between terminating the stream returned at the first NUL and simply discarding NULs altogether. Quoting:

The shell shall expand the command substitution by executing command in a subshell environment (see Shell Execution Environment) and replacing the command substitution (the text of command plus the enclosing "$()" or backquotes) with the standard output of the command, removing sequences of one or more characters at the end of the substitution. Embedded characters before the end of the output shall not be removed; however, they may be treated as field delimiters and eliminated during field splitting, depending on the value of IFS and quoting that is in effect. If the output contains any null bytes, the behavior is unspecified.


This isn't to say you can't read and produce streams containing NULs in widely-available shells! See the below, using process substitution (written for bash, but should work with ksh or zsh with minor changes if any):

# read content from stdin into array variable and a scalar variable "suffix"
array=( )
while IFS= read -r -d '' line; do
  array+=( "$line" )
done < <(process that generates NUL stream here)
suffix=$line # content after last NUL, if any

# emit recorded content
printf '%s\0' "${array[@]}"; printf '%s' "$suffix"


来源:https://stackoverflow.com/questions/32722007/is-skipping-ignoring-nul-bytes-on-process-substitution-standardized

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!