Consider the following simplified example:

my_prog|awk \'...\' > output.csv &
my_pid=\"$!\" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my         

    Based on your comment, I still can't see why you'd prefer killing my_prog to having it complete in an orderly fashion. Ten seconds is a pretty arbitrary measurement on a multiprocessing system whereby my_prog could generate 10k lines or 0 lines of output depending upon system load.

    If you want to limit the output of my_prog to something more determinate try

    my_prog | head -1000 | awk

    without detaching from the shell. In the worst case, head will close its input and my_prog will get a SIGPIPE. In the best case, change my_prog so it gives you the amount of output you want.

    added in response to comment:

    In so far as you have control over my_prog give it an optional -s duration argument. Then somewhere in your main loop you can put the predicate:

    if (duration_exceeded()) {

    where exit will in turn properly flush the output FILEs. If desperate and there is no place to put the predicate, this could be implemented using alarm(3), which I am intentionally not showing because it is bad.

    The core of your trouble is that my_prog runs forever. Everything else here is a hack to get around that limitation.

    Improving @Marvin's and @Nils Goroll's answers with a oneliner that extract the pids for all commands in the pipe into a shell array variable:

    # run some command
    ls -l | rev | sort > /dev/null &
    # collect pids
    pids=(`jobs -l % | egrep -o '^(\[[0-9]+\]\+|    ) [ 0-9]{5} ' | sed -e 's/^[^ ]* \+//' -e 's! $!!'`)
    # use them for something
    echo pid of ls -l: ${pids[0]}
    echo pid of rev: ${pids[1]}
    echo pid of sort: ${pids[2]}
    echo pid of first command e.g. ls -l: $pids
    echo pid of last command e.g. sort: ${pids[-1]}
    # wait for last command in pipe to finish
    wait ${pids[-1]}

    In my solution ${pids[-1]} contains the value normally available in $!. Please note the use of jobs -l % which outputs just the "current" job, which by default is the last one started.

    Sample output:

    pid of ls -l: 2725
    pid of rev: 2726
    pid of sort: 2727
    pid of first command e.g. ls -l: 2725
    pid of last command e.g. sort: 2727

    UPDATE 2017-11-13: Improved the pids=... command that works better with complex (multi-line) commands.

    Just had the same issue. My solution:

    process_1 | process_2 &
    PID_OF_PROCESS_1=`jobs -p`

    Just make sure process_1 is the first background process. Otherwise, you need to parse the full output of jobs -l.

    Add a shell wrapper around your command and capture the pid. For my example I use iostat.

    echo $$ > /tmp/
    exec iostat 1

    Exec replaces the shell with the new process preserving the pid. | grep avg

    While that runs:

    $ cat 
    $ ps -ef | grep iostat
    userid  22754  4058  0 12:33 pts/12   00:00:00 iostat 1

    So you can:

    sleep 10
    kill `cat`

    Is that more elegant?

    With inspiration from @Demosthenex's answer: using subshells:

    $ ( echo $BASHPID > pid1; exec vmstat 1 5 ) | tail -1 & 
    [1] 17371
    $ cat pid1
    $ pgrep -fl vmstat
    17370 vmstat 1 5
    I was desperately looking for good solution to get all the PIDs from a pipe job, and one promising approach failed miserably (see previous revisions of this answer).

    So, unfortunately, the best I could come up with is parsing the jobs -l output using GNU awk:

    function last_job_pids {
        if [[ -z "${1}" ]] ; then
        jobs -l | awk '
            /^\[/ { delete pids; pids[$2]=$2; seen=1; next; }
            // { if (seen) { pids[$1]=$1; } }
            END { for (p in pids) print p; }'
    0 讨论(0)