In various bash scripts I have come across the following: $\'\\0\'
An example with some context:
while read -r -d $\'\\0\' line; do
ec
It is technically true that the expansion $'\0'
will always become the empty string ''
(a.k.a. the null string) to the shell (not in zsh). Or, worded the other way around, a $'\0'
will never expand to an ascii NUL
(or byte with zero value), (again, not in zsh). It should be noted that it is confusing that both names are quite similar: NUL
and null
.
However, there is an aditional (quite confusing) twist when we talk about read -d ''
.
What read
see is the value ''
(the null string) as the delimiter.
What read
does is split the input from stdin on the character $'\0'
(yes an actual 0x00
).
In a bash script, what would $'\0' evaluate to and why?
That means that we need to explain what $'\0'
is expanded to.
What $'\0'
is expanded to is very easy: it expands to the null string ''
(in most shells, not in zsh).
But the example of use is:
read -r -d $'\0'
That transform the question to: what delimiter character does $'\0' expand to ?
This holds a very confusing twist. To address that correctly, we need to take a full circle tour of when and how a NUL (a byte with zero value or '0x00') is used in shells.
We need some NUL to work with. It is possible to generate NUL bytes from shell:
$ echo -e 'ab\0cd' | od -An -vtx1
61 62 00 63 64 0a ### That works in bash.
$ printf 'ab\0cd' | od -An -vtx1
61 62 00 63 64 ### That works in all shells tested.
A variable in shell will not store a NUL.
$ printf -v a 'ab\0cd'; printf '%s' "$a" | od -An -vtx1
61 62
The example is meant to be executed in bash as only bash printf has the -v
option.
But the example is clear to show that a string that contains a NUL will be cut at the NUL.
Simple variables will cut the string at the zero byte.
As is reasonable to expect if the string is a C string, which must end on a NUL \0
.
As soon as a NUL is found the string must end.
A NUL will work differently when used in a command substitution.
This code should assign a value to the variable $a
and then print it:
$ a=$(printf 'ab\0cd'); printf '%s' "$a" | od -An -vtx1
And it does, but with different results in different shells:
### several shells just ignore (remove)
### a NUL in the value of the expanded command.
/bin/dash : 61 62 63 64
/bin/sh : 61 62 63 64
/bin/b43sh : 61 62 63 64
/bin/bash : 61 62 63 64
/bin/lksh : 61 62 63 64
/bin/mksh : 61 62 63 64
### ksh trims the the value.
/bin/ksh : 61 62
/bin/ksh93 : 61 62
### zsh sets the var to actually contain the NUL value.
/bin/zsh : 61 62 00 63 64
/bin/zsh4 : 61 62 00 63 64
It is of special mention that bash (version 4.4) warns about the fact:
/bin/b44sh : warning: command substitution: ignored null byte in input
61 62 63 64
In command substitution the zero byte is silently ignored by the shell.
It is very important to understand that that does not happen in zsh.
Now that we have all the pieces about NUL. We may look at what read does.
read
do on NUL delimiter.That brings us back to the command read -d $'\0'
:
while read -r -d $'\0' line; do
The $'\0'
shoud have been expanded to a byte of value 0x00
, but the shell cuts it and it actually becomes ''
.
That means that both $'\0'
and ''
are received by read as the same value.
Having said that, it may seem reasonable to write the equivalent construct:
while read -r -d '' line; do
And it is technically correct.
There are two sides of this point, one that is the character after the -d option of read, the other one, which is addressed here, is: what character will read use if given a delimiter as -d $'\0'
?.
The first side has been answered in detail above.
The second side is very confusing twist as the command read
will actually read up to the next byte of value 0x00
(which is what $'\0'
represents).
To actually show that that is the case:
#!/bin/bash
# create a test file with some zero bytes.
printf 'ab\0cd\0ef\ngh\n' > tfile
while true ; do
read -r -d '' line; a=$?
echo "exit $a"
if [[ $a == 1 ]]; then
printf 'last %s\n' "$line"
break
else
printf 'normal %s\n' "$line"
fi
done
when executed, the output will be:
$ ./script.sh
exit 0
normal ab
exit 0
normal cd
exit 1
last ef
gh
The first two exit 0
are successfully reads done up to the next "zero byte", and both contain the correct values of ab
and cd
. The next read is the last one (as there are no more zero bytes) and contains the value $'ef\ngh' (yes, it also contains a new line).
All this goes to show (and prove) that read -d ''
actually reads up to the next "zero byte", which is also known by the ascii name NUL
and should have been the result of a $'\0'
expansion.
In short: we can safely state that read -d ''
reads up to the next 0x00
(NUL).
We must state that a read -d $'\0'
will expand to a delimiter of 0x00
.
Using $'\0'
is a better way to transmit to the reader this correct meaning.
As a code style thing: I write $'\0' to make my intentions clear.
One, and only one, character used as a delimiter: the byte value of 0x00
(even if in bash it happens to be cut)
Note: Either this commands will print the hex values of the stream.
$ printf 'ab\0cd' | od -An -vtx1
$ printf 'ab\0cd' | xxd -p
$ printf 'ab\0cd' | hexdump -v -e '/1 "%02X "'
61 62 00 63 64