Different pipeline behavior between sh and ksh

混江龙づ霸主 提交于 2019-12-08 05:42:21

问题


I have isolated the problem to the below code snippet:

  1. Notice below that null string gets assigned to LATEST_FILE_NAME='' when the script is run using ksh; but the script assigns the value to variable $LATEST_FILE_NAME correctly when run using sh. This in turn affects the value of $FILE_LIST_COUNT.
  2. But as the script is in KornShell (ksh), I am not sure what might be causing the issue.
  3. When I comment out the tee command in the below line, the ksh script works fine and correctly assigns the value to variable $LATEST_FILE_NAME.
(cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH

Kindly consider:

1. Source Code: script.sh

#!/usr/bin/ksh
set -vx # Enable debugging

SCRIPTLOGSDIR=/some/path/Scripts/TEST/shell_issue
SOURCE_FILE_PATH=/some/path/Scripts/TEST/shell_issue
# Log file
Timestamp=`date +%Y%m%d%H%M`
LOG_FILENAME="TEST_LOGS_${Timestamp}.log"
LOG_FILE_PATH="${SCRIPTLOGSDIR}/${LOG_FILENAME}"
## Temporary files
FILE_LIST=FILE_LIST.temp    #Will store all  extract filenames
FILE_LIST_COUNT=0           # Stores total number of  files

getFileListDetails(){
    rm -f $SOURCE_FILE_PATH/$FILE_LIST 2>&1 | tee -a $LOG_FILE_PATH

    # Get list of all files, Sort in reverse order, and store names of the  files line-wise. If no files are found, error is muted.
    (cd $SOURCE_FILE_PATH; ls *.txt 2>/dev/null) | sort -r > ${SOURCE_FILE_PATH}/${FILE_LIST} | tee -a $LOG_FILE_PATH

    if [[ ! -f $SOURCE_FILE_PATH/$FILE_LIST ]]; then
        echo "FATAL ERROR - Could not create a temp file for  file list.";exit 1;
    fi

    LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)";
    FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)";

}

getFileListDetails;
exit 0;

2. Output when using shell sh script.sh:

+ getFileListDetails
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ cd /some/path/Scripts/TEST/shell_issue
+ sort -r
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300506.log
+ ls 1.txt 2.txt 3.txt
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
cd $SOURCE_FILE_PATH; head -1 $FILE_LIST
++ cd /some/path/Scripts/TEST/shell_issue
++ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=3.txt
cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l
++ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
++ wc -l
+ FILE_LIST_COUNT=3
exit 0;
+ exit 0

3. Output when using ksh ksh script.sh:

+ getFileListDetails
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ rm -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ 2>& 1
+ tee -a /some/path/Scripts/TEST/shell_issue/TEST_LOGS_201304300507.log
+ sort -r
+ 1> /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ cd /some/path/Scripts/TEST/shell_issue
+ ls 1.txt 2.txt 3.txt
+ 2> /dev/null
+ [[ ! -f /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp ]]
+ cd /some/path/Scripts/TEST/shell_issue
+ head -1 FILE_LIST.temp
+ LATEST_FILE_NAME=''
+ wc -l
+ cat /some/path/Scripts/TEST/shell_issue/FILE_LIST.temp
+ FILE_LIST_COUNT=0
exit 0;+ exit 0

回答1:


OK, here goes...this is a tricky and subtle one. The answer lies in how pipelines are implemented. POSIX states that

If the pipeline is not in the background (see Asynchronous Lists), the shell shall wait for the last command specified in the pipeline to complete, and may also wait for all commands to complete.)

Notice the keyword may. Many shells implement this in a way that all commands need to complete, e.g. see the bash manpage:

The shell waits for all commands in the pipeline to terminate before returning a value.

Notice the wording in the ksh manpage:

Each command, except possibly the last, is run as a separate process; the shell waits for the last command to terminate.

In your example, the last command is the tee command. Since there is no input to tee because you redirect stdout to ${SOURCE_FILE_PATH}/${FILE_LIST} in the command before, it immediately exits. Oversimplified speaking, the tee is faster than the earlier redirection, which means that your file is probably not finished writing to by the time you are reading from it. You can test this (this is not a fix!) by adding a sleep at the end of the whole command:

$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[]

$ ksh -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; sleep 0.1; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]

$ bash -c 'ls /tmp/* | sort -r > /tmp/foo.txt | tee /tmp/bar.txt; echo "[$(head -n 1 /tmp/foo.txt)]"'
[/tmp/sess_vo93c7h7jp2a49tvmo7lbn6r63]

That being said, here are a few other things to consider:

  1. Always quote your variables, especially when dealing with files, to avoid problems with globbing, word splitting (if your path contains spaces) etc.:

    do_something "${this_is_my_file}"

  2. head -1 is deprecated, use head -n 1

  3. If you only have one command on a line, the ending semicolon ; is superfluous...just skip it

  4. LATEST_FILE_NAME="$(cd $SOURCE_FILE_PATH; head -1 $FILE_LIST)"

    No need to cd into the directory first, just specify the whole path as argument to head:

    LATEST_FILE_NAME="$(head -n 1 "${SOURCE_FILE_PATH}/${FILE_LIST}")"

  5. FILE_LIST_COUNT="$(cat $SOURCE_FILE_PATH/$FILE_LIST | wc -l)"

    This is called Useless Use Of Cat because the cat is not needed - wc can deal with files. You probably used it because the output of wc -l myfile includes the filename, but you can use e.g. FILE_LIST_COUNT="$(wc -l < "${SOURCE_FILE_PATH}/${FILE_LIST}")" instead.

Furthermore, you will want to read Why you shouldn't parse the output of ls(1) and How can I get the newest (or oldest) file from a directory?.



来源:https://stackoverflow.com/questions/16069339/different-pipeline-behavior-between-sh-and-ksh

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!