In a bash script, what would $'\0' evaluate to and why?

后端 未结 4 1746
刺人心
刺人心 2021-02-15 16:53

In various bash scripts I have come across the following: $\'\\0\'

An example with some context:

while read -r -d $\'\\0\' line; do
    ec         


        
4条回答
  •  情深已故
    2021-02-15 17:28

    It is technically true that the expansion $'\0' will always become the empty string '' (a.k.a. the null string) to the shell (not in zsh). Or, worded the other way around, a $'\0' will never expand to an ascii NUL (or byte with zero value), (again, not in zsh). It should be noted that it is confusing that both names are quite similar: NUL and null.

    However, there is an aditional (quite confusing) twist when we talk about read -d ''.

    What read see is the value '' (the null string) as the delimiter.

    What read does is split the input from stdin on the character $'\0' (yes an actual 0x00).


    Expanded answer.

    The question in the tittle is:

    In a bash script, what would $'\0' evaluate to and why?

    That means that we need to explain what $'\0' is expanded to.

    What $'\0' is expanded to is very easy: it expands to the null string '' (in most shells, not in zsh).

    But the example of use is:

    read -r -d $'\0'
    

    That transform the question to: what delimiter character does $'\0' expand to ?

    This holds a very confusing twist. To address that correctly, we need to take a full circle tour of when and how a NUL (a byte with zero value or '0x00') is used in shells.

    Stream.

    We need some NUL to work with. It is possible to generate NUL bytes from shell:

    $ echo -e 'ab\0cd' | od -An -vtx1
    61 62 00 63 64 0a                           ### That works in bash.
    
    $ printf 'ab\0cd' | od -An -vtx1
    61 62 00 63 64                              ### That works in all shells tested.
    

    Variable.

    A variable in shell will not store a NUL.

    $ printf -v a 'ab\0cd'; printf '%s' "$a" | od -An -vtx1
    61 62
    

    The example is meant to be executed in bash as only bash printf has the -v option. But the example is clear to show that a string that contains a NUL will be cut at the NUL. Simple variables will cut the string at the zero byte. As is reasonable to expect if the string is a C string, which must end on a NUL \0. As soon as a NUL is found the string must end.

    Command substitution.

    A NUL will work differently when used in a command substitution. This code should assign a value to the variable $a and then print it:

    $ a=$(printf 'ab\0cd'); printf '%s' "$a" | od -An -vtx1
    

    And it does, but with different results in different shells:

    ### several shells just ignore (remove)
    ### a NUL in the value of the expanded command.
    /bin/dash       :  61 62 63 64
    /bin/sh         :  61 62 63 64
    /bin/b43sh      :  61 62 63 64
    /bin/bash       :  61 62 63 64
    /bin/lksh       :  61 62 63 64
    /bin/mksh       :  61 62 63 64
    
    ### ksh trims the the value.
    /bin/ksh        :  61 62
    /bin/ksh93      :  61 62
    
    ### zsh sets the var to actually contain the NUL value.
    /bin/zsh        :  61 62 00 63 64
    /bin/zsh4       :  61 62 00 63 64
    

    It is of special mention that bash (version 4.4) warns about the fact:

    /bin/b44sh      :  warning: command substitution: ignored null byte in input
    61 62 63 64
    

    In command substitution the zero byte is silently ignored by the shell.
    It is very important to understand that that does not happen in zsh.

    Now that we have all the pieces about NUL. We may look at what read does.

    What read do on NUL delimiter.

    That brings us back to the command read -d $'\0':

    while read -r -d $'\0' line; do
    

    The $'\0' shoud have been expanded to a byte of value 0x00, but the shell cuts it and it actually becomes ''. That means that both $'\0' and '' are received by read as the same value.

    Having said that, it may seem reasonable to write the equivalent construct:

    while read -r -d '' line; do
    

    And it is technically correct.

    What a delimiter of '' actually does.

    There are two sides of this point, one that is the character after the -d option of read, the other one, which is addressed here, is: what character will read use if given a delimiter as -d $'\0'?.

    The first side has been answered in detail above.

    The second side is very confusing twist as the command read will actually read up to the next byte of value 0x00 (which is what $'\0' represents).

    To actually show that that is the case:

    #!/bin/bash
    
    # create a test file with some zero bytes.
    printf 'ab\0cd\0ef\ngh\n' > tfile
    
    while true ; do
        read -r -d '' line; a=$?
        echo "exit $a"
        if [[ $a == 1 ]]; then
            printf 'last %s\n' "$line"
            break
        else
            printf 'normal %s\n' "$line"
        fi
    done 

    when executed, the output will be:

    $ ./script.sh
    exit 0
    normal ab
    exit 0
    normal cd
    exit 1
    last ef
    gh
    

    The first two exit 0 are successfully reads done up to the next "zero byte", and both contain the correct values of ab and cd. The next read is the last one (as there are no more zero bytes) and contains the value $'ef\ngh' (yes, it also contains a new line).

    All this goes to show (and prove) that read -d '' actually reads up to the next "zero byte", which is also known by the ascii name NUL and should have been the result of a $'\0' expansion.

    In short: we can safely state that read -d '' reads up to the next 0x00 (NUL).

    Conclusion:

    We must state that a read -d $'\0' will expand to a delimiter of 0x00. Using $'\0' is a better way to transmit to the reader this correct meaning. As a code style thing: I write $'\0' to make my intentions clear.

    One, and only one, character used as a delimiter: the byte value of 0x00 (even if in bash it happens to be cut)


    Note: Either this commands will print the hex values of the stream.

    $ printf 'ab\0cd' | od -An -vtx1
    $ printf 'ab\0cd' | xxd -p
    $ printf 'ab\0cd' | hexdump -v -e '/1 "%02X "'
    61 62 00 63 64
    

提交回复
热议问题