Why piping to the same file doesn't work on some platforms?

后端 未结 5 1609
谎友^
谎友^ 2020-12-20 22:22

In cygwin, the following code works fine

$ cat junk
bat
bat
bat

$ cat junk | sort -k1,1 |tr \'b\' \'z\' > junk

$ cat junk
zat
zat
zat

相关标签:
5条回答
  • 2020-12-20 22:54

    In general this can be expected to break. The processes in a pipeline are all started up in parallel, so the > junk at the end of the line will usually truncate your input file before the process at the head of the pipelining has finished (or even started) reading from it.

    Even if bash under Cygwin let's you get away with this you shouldn't rely on it. The general solution is to redirect to a temporary file and then rename it when the pipeline is complete.

    0 讨论(0)
  • 2020-12-20 22:58

    You want to edit that file, you can just use the editor.

    ex junk << EOF
    %!(sort -k1,1 |tr 'b' 'z')
    x
    EOF
    
    0 讨论(0)
  • 2020-12-20 23:00

    Overriding the same file in pipeline is not advice, because when you do the mistake you can't get it back (unless you've the backup or it's the under version control).

    This happens, because the input and output in pipeline is automatically buffered (which gives you an impression it works), but it actually it's running in parallel. Different platforms could buffer the output in different way (based on the settings), so on some you end up with empty file (because the file would be created at the start), on some other with half-finished file.

    The solution is to use some method when the file is only overridden when it encounters an EOF with full buffered and processed input.

    This can be achieved by:

    • Using utility which can soaks up all its input before opening the output file.

      This can either be done by sponge (as opposite of unbuffer from expect package).

    • Avoid using I/O redirection syntax (which can create the empty file before starting the command).

      For example using tee (which buffers its standard streams), for example:

      cat junk | sort | tee junk
      

      This would only work with sort, because it expects all the input to process the sorting. So if your command doesn't use sort, add one.

      Another tool which can be used is stdbuf which modifies buffering operations for its standard streams where you can specify the buffer size.

    • Use text processor which can edit files in-place (such as sed or ex).

      Example:

      $ ex -s +'%!sort -k1' -cxa myfile.txt
      $ sed -i '' s/foo/bar/g myfile.txt
      
    0 讨论(0)
  • 2020-12-20 23:01

    Using the following simple script, you can make it work like you want to:

    $ cat junk | sort -k1,1 |tr 'b' 'z' | overwrite_file.sh junk
    

    overwrite_file.sh

    #!/usr/bin/env bash
    
    OUT=$(cat -)
    
    FILENAME="$*"
    
    echo "$OUT" | tee "$FILENAME"
    

    Note that if you don't want the updated file to be send to stdout, you can use this approach instead

    overwrite_file_no_output.sh

    #!/usr/bin/env bash
    
    OUT=$(cat -)
    
    FILENAME="$*"
    
    echo "$OUT" > "$FILENAME"
    
    0 讨论(0)
  • 2020-12-20 23:03

    Four main points here:

    1. "Useless use of cat." Don't do that.
    2. You're not actually sorting anything with sort. Don't do that.
    3. Your pipeline doesn't say what you think it does. Don't do that.
    4. You're trying to over-write a file in-place while reading from it. Don't do that.

    One of the reasons you are getting inconsistent behavior is that you are piping to a process that has redirection, rather than redirecting the output of the pipeline as a whole. The difference is subtle, but important.

    What you want is to create a compound command with Command Grouping, so that you can redirect the input and output of the whole pipeline. In your case, this should work properly:

    { sort -k1,1 | tr 'c' 'z'; } < junk > sorted_junk
    

    Please note that without anything to sort, you might as well skip the sort command too. Then your command can be run without the need for command grouping:

    tr 'c' 'z' < junk > sorted_junk
    

    Keep redirections and pipelines as simple as possible. It makes debugging your scripts much easier.

    However, if you still want to abuse the pipeline for some reason, you could use the sponge utility from the moreutils package. The man page says:

    sponge reads standard input and writes it out to the specified file. Unlike a shell redirect, sponge soaks up all its input before opening the output file. This allows constricting pipelines that read from and write to the same file.

    So, your original command line can be re-written like this:

    cat junk | sort -k1,1 | tr 'c' 'z' | sponge junk
    

    and since junk will not be overwritten until sponge receives EOF from the pipeline, you will get the results you were expecting.

    0 讨论(0)
提交回复
热议问题