Consider the following simplified example:
my_prog|awk \'...\' > output.csv &
my_pid=\"$!\" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my
Here is a solution without wrappers or temporary files. This only works for a background pipeline whose output is captured away from stdout of the containing script, as in your case. Suppose you want to do:
cmd1 | cmd2 | cmd3 >pipe_out &
# do something with PID of cmd2
If only bash could provide ${PIPEPID[n]}
!! The replacement "hack" that I found is the following:
PID=$( { cmd1 | { cmd2 0<&4 & echo $! >&3 ; } 4<&0 | cmd3 >pipe_out & } 3>&1 | head -1 )
If needed, you can also close the fd 3 (for cmd*
) and fd 4 (for cmd2
) with 3>&-
and 4<&-
, respectively. If you do that, for cmd2
make sure you close fd 4 only after you redirect fd 0 from it.
I was able to solve it with explicitly naming the pipe using mkfifo
.
Step 1: mkfifo capture
.
Step 2: Run this script
my_prog > capture &
my_pid="$!" #Now, I have the PID for my_prog!
awk '...' capture > out.csv &
sleep 10
kill $my_pid #kill my_prog
wait #wait for awk to finish.
I don't like the management of having a mkfifo. Hopefully someone has an easier solution.
My solution was to query jobs
and parse it using perl
.
Start two pipelines in the background:
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
$ sleep 600 | sleep 600 |sleep 600 |sleep 600 |sleep 600 &
Query background jobs:
$ jobs
[1]- Running sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
[2]+ Running sleep 600 | sleep 600 | sleep 600 | sleep 600 | sleep 600 &
$ jobs -l
[1]- 6108 Running sleep 600
6109 | sleep 600
6110 | sleep 600
6111 | sleep 600
6112 | sleep 600 &
[2]+ 6114 Running sleep 600
6115 | sleep 600
6116 | sleep 600
6117 | sleep 600
6118 | sleep 600 &
Parse the jobs list of the second job %2
. The parsing is probably error prone, but in these cases it works. We aim to capture the first number followed by a space. It is stored into the variable pids
as an array using the parenthesis:
$ pids=($(jobs -l %2 | perl -pe '/(\d+) /; $_=$1 . "\n"'))
$ echo $pids
6114
$ echo ${pids[*]}
6114 6115 6116 6117 6118
$ echo ${pids[2]}
6116
$ echo ${pids[4]}
6118
And for the first pipeline:
$ pids=($(jobs -l %1 | perl -pe '/(\d+) /; $_=$1 . "\n"'))
$ echo ${pids[2]}
6110
$ echo ${pids[4]}
6112
We could wrap this into a little alias/function:
function pipeid() { jobs -l ${1:-%%} | perl -pe '/(\d+) /; $_=$1 . "\n"'; }
$ pids=($(pipeid)) # PIDs of last job
$ pids=($(pipeid %1)) # PIDs of first job
I have tested this in bash
and zsh
. Unfortunately, in bash
I could not pipe the output of pipeid into another command. Probably because that pipeline is ran in a sub shell not able to query the job list??