OS X / Linux: pipe into two processes?

前端 未结 6 1605
醉酒成梦
醉酒成梦 2020-11-27 15:08

I know about

program1 | program2

and

program1 | tee outputfile | program2

but is there a way to feed prog

相关标签:
6条回答
  • 2020-11-27 15:52

    Other answers introduce the concept. Here is an actual demonstration:

    $ echo "Leeroy Jenkins" | tee >(md5sum > out1) >(sha1sum > out2) > out3
    
    $ cat out1
    11e001d91e4badcff8fe22aea05a7458  -
    
    $ echo "Leeroy Jenkins" | md5sum
    11e001d91e4badcff8fe22aea05a7458  -
    
    $ cat out2
    5ed25619ce04b421fab94f57438d6502c66851c1  -
    
    $ echo "Leeroy Jenkins" | sha1sum
    5ed25619ce04b421fab94f57438d6502c66851c1  -
    
    $ cat out3
    Leeroy Jenkins
    

    Of course you can > /dev/null instead of out3.

    0 讨论(0)
  • 2020-11-27 15:55

    You can do this with tee and process substitution.

    program1 | tee >(program2) >(program3)
    

    The output of program1 will be piped to whatever is inside ( ), in this case program2 and program3.

    0 讨论(0)
  • 2020-11-27 15:56

    use (;) syntax... try ps aux | (head -n 1; tail -n 1)

    0 讨论(0)
  • 2020-11-27 16:02

    You can always try to save output of program1 to a file and then feed it into program2 and program3 input.

    program1 > temp; program2 < temp; program3 < temp;
    
    0 讨论(0)
  • 2020-11-27 16:04

    Intro about parallelisation

    This seem trivial, but doing this is not only possible, also doing so will generate concurrent or simultaneous process.

    You may have to take care about some particular effects, like order of execution, exection time, etc.

    There are some sample at end of this post.

    Compatible answer first

    As this question is flagged shell and unix, I will first give a POSIX compatible answer. (for bashisms, go further.)

    Yes, there is a way to use unnamed pipes.

    In this sample, I will generate a range of 100'000 numbers, randomize them and compress the result using 4 different compression tools to compare the compression ratio...

    For this to I will first run the preparation:

    GZIP_CMD=`which gzip`
    BZIP2_CMD=`which bzip2`
    LZMA_CMD=`which lzma`
    XZ_CMD=`which xz`
    MD5SUM_CMD=`which md5sum`
    SED_CMD=`which sed`
    

    Note: specifying full path to commands prevent some shell interpreter (like busybox) to run built-in compressor. And doing way will ensure same syntax will run independently of os installation (paths could be different between MacOs, Ubuntu, RedHat, HP-Ux and so...).

    The syntax NN>&1 (where NN is a number between 3 and 63) do generate unnamed pipe who could by find at /dev/fd/NN. (The file descriptors 0 to 2 are already open for 0: STDIN, 1: STDOUT and 2: STDERR).

    Try this (tested under dash, busybox and bash) :

    (((( seq 1 100000 | shuf | tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 | $GZIP_CMD >/tmp/tst.gz ) 4>&1 | $BZIP2_CMD >/tmp/tst.bz2 ) 5>&1 | $LZMA_CMD >/tmp/tst.lzma ) 6>&1 | $XZ_CMD >/tmp/tst.xz ) 7>&1 | $MD5SUM_CMD
    

    or more readable:

    GZIP_CMD=`which gzip`
    BZIP2_CMD=`which bzip2`
    LZMA_CMD=`which lzma`
    XZ_CMD=`which xz`
    MD5SUM_CMD=`which md5sum`
    
    (
      (
        (
          (
            seq 1 100000 |
              shuf |
              tee /dev/fd/4 /dev/fd/5 /dev/fd/6 /dev/fd/7 |
              $GZIP_CMD >/tmp/tst.gz
          ) 4>&1 |
            $BZIP2_CMD >/tmp/tst.bz2
        ) 5>&1 |
          $LZMA_CMD >/tmp/tst.lzma
      ) 6>&1 |
        $XZ_CMD >/tmp/tst.xz
    ) 7>&1 |
      $MD5SUM_CMD
    2e67f6ad33745dc5134767f0954cbdd6  -
    

    As shuf do random placement, if you try this, you must obtain different result,

    ls -ltrS /tmp/tst.*
    -rw-r--r-- 1 user user 230516 oct  1 22:14 /tmp/tst.bz2
    -rw-r--r-- 1 user user 254811 oct  1 22:14 /tmp/tst.lzma
    -rw-r--r-- 1 user user 254892 oct  1 22:14 /tmp/tst.xz
    -rw-r--r-- 1 user user 275003 oct  1 22:14 /tmp/tst.gz
    

    but you must be able to compare md5 checksums:

    SED_CMD=`which sed`
    
    for chk in gz:$GZIP_CMD bz2:$BZIP2_CMD lzma:$LZMA_CMD xz:$XZ_CMD;do
        ${chk#*:} -d < /tmp/tst.${chk%:*} |
            $MD5SUM_CMD |
            $SED_CMD s/-$/tst.${chk%:*}/
      done
    2e67f6ad33745dc5134767f0954cbdd6  tst.gz
    2e67f6ad33745dc5134767f0954cbdd6  tst.bz2
    2e67f6ad33745dc5134767f0954cbdd6  tst.lzma
    2e67f6ad33745dc5134767f0954cbdd6  tst.xz
    

    Using bash features

    Using some bashims, this could look nicer, for sample use /dev/fd/{4,5,6,7}, instead of tee /dev/fd/4 /dev/fd/5 /...

    (((( seq 1 100000 | shuf | tee /dev/fd/{4,5,6,7} | gzip >/tmp/tst.gz ) 4>&1 |
       bzip2 >/tmp/tst.bz2 ) 5>&1 | lzma >/tmp/tst.lzma ) 6>&1 |
       xz >/tmp/tst.xz ) 7>&1 | md5sum
    29078875555e113b31bd1ae876937d4b  -
    

    will work same.

    Final check

    This won't create any file, but would let you compare size of a compressed range of sorted integers, between 4 different compression tool (for fun, I used 4 different way for formatting output):

    (
      (
        (
          (
            (
              seq 1 100000 |
                tee /dev/fd/{4,5,6,7} |
                  gzip |
                  wc -c |
                  sed s/^/gzip:\ \ / >&3
            ) 4>&1 |
              bzip2 |
              wc -c |
              xargs printf "bzip2: %s\n" >&3
          ) 5>&1 |
            lzma |
            wc -c |
            perl -pe 's/^/lzma:   /' >&3
        ) 6>&1 |
          xz |
          wc -c |
          awk '{printf "xz: %9s\n",$1}' >&3
      ) 7>&1 |
        wc -c
    ) 3>&1
    gzip:  215157
    bzip2: 124009
    lzma:   17948
    xz:     17992
    588895
    

    This demonstrate how to use stdin and stdout redirected in subshell and merged in console for final output.

    Syntax >(...) and <(...)

    Recent bash versions permit a new syntax feature.

    seq 1 100000 | wc -l
    100000
    
    seq 1 100000 > >( wc -l )
    100000
    
    wc -l < <( seq 1 100000 )
    100000
    

    As | is an unnamed pipe to /dev/fd/0, the syntax <() do generate temporary unnamed pipe with others file descriptor /dev/fd/XX.

    md5sum <(zcat /tmp/tst.gz) <(bzcat /tmp/tst.bz2) <(
             lzcat /tmp/tst.lzma) <(xzcat /tmp/tst.xz)
    29078875555e113b31bd1ae876937d4b  /dev/fd/63
    29078875555e113b31bd1ae876937d4b  /dev/fd/62
    29078875555e113b31bd1ae876937d4b  /dev/fd/61
    29078875555e113b31bd1ae876937d4b  /dev/fd/60
    

    More sophisticated demo

    This require GNU file utility to be installed. Will determine command to be run by extension or file type.

    for file in /tmp/tst.*;do
        cmd=$(which ${file##*.}) || {
            cmd=$(file -b --mime-type $file)
            cmd=$(which ${cmd#*-})
        }
        read -a md5 < <($cmd -d <$file|md5sum)
        echo $md5 \ $file
      done
    29078875555e113b31bd1ae876937d4b  /tmp/tst.bz2
    29078875555e113b31bd1ae876937d4b  /tmp/tst.gz
    29078875555e113b31bd1ae876937d4b  /tmp/tst.lzma
    29078875555e113b31bd1ae876937d4b  /tmp/tst.xz
    

    This let you do same previous thing by following syntax:

    seq 1 100000 |
        shuf |
            tee >(
                echo gzip. $( gzip | wc -c )
              )  >(
                echo gzip, $( wc -c < <(gzip))
              ) >(
                gzip  | wc -c | sed s/^/gzip:\ \ /
              ) >(
                bzip2 | wc -c | xargs printf "bzip2: %s\n"
              ) >(
                lzma  | wc -c | perl -pe 's/^/lzma:  /'
              ) >(
                xz    | wc -c | awk '{printf "xz: %9s\n",$1}'
              ) > >(
                echo raw: $(wc -c)
              ) |
            xargs printf "%-8s %9d\n"
    
    raw:        588895
    xz:         254556
    lzma:       254472
    bzip2:      231111
    gzip:       274867
    gzip,       274867
    gzip.       274867
    

    Note I used different way used to compute gzip compressed count.

    Note Because this operation was done simultaneously, output order will depend on time required by each command.

    Going further about parallelisation

    If you run some multi-core or multi-processor computer, try to compare this:

    i=1
    time for file in /tmp/tst.*;do
        cmd=$(which ${file##*.}) || {
            cmd=$(file -b --mime-type $file)
            cmd=$(which ${cmd#*-})
        }
        read -a md5 < <($cmd -d <$file|md5sum)
        echo $((i++)) $md5 \ $file
      done |
    cat -n
    

    wich may render:

         1      1 29078875555e113b31bd1ae876937d4b  /tmp/tst.bz2
         2      2 29078875555e113b31bd1ae876937d4b  /tmp/tst.gz
         3      3 29078875555e113b31bd1ae876937d4b  /tmp/tst.lzma
         4      4 29078875555e113b31bd1ae876937d4b  /tmp/tst.xz
    
    real    0m0.101s
    

    with this:

    time  (
        i=1 pids=()
        for file in /tmp/tst.*;do
            cmd=$(which ${file##*.}) || {
                cmd=$(file -b --mime-type $file)
                cmd=$(which ${cmd#*-})
            }
            (
                 read -a md5 < <($cmd -d <$file|md5sum)
                 echo $i $md5 \ $file
            ) & pids+=($!)
          ((i++))
          done
        wait ${pids[@]}
    ) |
    cat -n
    

    could give:

         1      2 29078875555e113b31bd1ae876937d4b  /tmp/tst.gz
         2      1 29078875555e113b31bd1ae876937d4b  /tmp/tst.bz2
         3      4 29078875555e113b31bd1ae876937d4b  /tmp/tst.xz
         4      3 29078875555e113b31bd1ae876937d4b  /tmp/tst.lzma
    
    real    0m0.070s
    

    where ordering depend on type used by each fork.

    0 讨论(0)
  • 2020-11-27 16:08

    The bash manual mentions how it emulates the >(...) syntax using either named pipes or named file descriptors, so if you don't want to depend on bash, perhaps you could do that manually in your script.

    mknod FIFO
    program3 < FIFO &
    program1 | tee FIFO | program2
    wait
    rm FIFO
    
    0 讨论(0)
自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题