Consider the following simplified example:
my_prog|awk \'...\' > output.csv &
my_pid=\"$!\" #Gives the PID for awk instead of for my_prog
sleep 10
kill $my
Based on your comment, I still can't see why you'd prefer killing my_prog
to having it complete in an orderly fashion. Ten seconds is a pretty arbitrary measurement on a multiprocessing system whereby my_prog
could generate 10k lines or 0 lines of output depending upon system load.
If you want to limit the output of my_prog
to something more determinate try
my_prog | head -1000 | awk
without detaching from the shell. In the worst case, head will close its input and my_prog will get a SIGPIPE. In the best case, change my_prog
so it gives you the amount of output you want.
added in response to comment:
In so far as you have control over my_prog
give it an optional -s duration
argument. Then somewhere in your main loop you can put the predicate:
if (duration_exceeded()) {
exit(0);
}
where exit will in turn properly flush the output FILEs. If desperate and there is no place to put the predicate, this could be implemented using alarm(3), which I am intentionally not showing because it is bad.
The core of your trouble is that my_prog
runs forever. Everything else here is a hack to get around that limitation.
Improving @Marvin's and @Nils Goroll's answers with a oneliner that extract the pids for all commands in the pipe into a shell array variable:
# run some command
ls -l | rev | sort > /dev/null &
# collect pids
pids=(`jobs -l % | egrep -o '^(\[[0-9]+\]\+| ) [ 0-9]{5} ' | sed -e 's/^[^ ]* \+//' -e 's! $!!'`)
# use them for something
echo pid of ls -l: ${pids[0]}
echo pid of rev: ${pids[1]}
echo pid of sort: ${pids[2]}
echo pid of first command e.g. ls -l: $pids
echo pid of last command e.g. sort: ${pids[-1]}
# wait for last command in pipe to finish
wait ${pids[-1]}
In my solution ${pids[-1]}
contains the value normally available in $!
. Please note the use of jobs -l %
which outputs just the "current" job, which by default is the last one started.
Sample output:
pid of ls -l: 2725
pid of rev: 2726
pid of sort: 2727
pid of first command e.g. ls -l: 2725
pid of last command e.g. sort: 2727
UPDATE 2017-11-13: Improved the pids=...
command that works better with complex (multi-line) commands.
Just had the same issue. My solution:
process_1 | process_2 &
PID_OF_PROCESS_2=$!
PID_OF_PROCESS_1=`jobs -p`
Just make sure process_1 is the first background process. Otherwise, you need to parse the full output of jobs -l
.
Add a shell wrapper around your command and capture the pid. For my example I use iostat.
#!/bin/sh
echo $$ > /tmp/my.pid
exec iostat 1
Exec replaces the shell with the new process preserving the pid.
test.sh | grep avg
While that runs:
$ cat my.pid
22754
$ ps -ef | grep iostat
userid 22754 4058 0 12:33 pts/12 00:00:00 iostat 1
So you can:
sleep 10
kill `cat my.pid`
Is that more elegant?
With inspiration from @Demosthenex's answer: using subshells:
$ ( echo $BASHPID > pid1; exec vmstat 1 5 ) | tail -1 &
[1] 17371
$ cat pid1
17370
$ pgrep -fl vmstat
17370 vmstat 1 5
I was desperately looking for good solution to get all the PIDs from a pipe job, and one promising approach failed miserably (see previous revisions of this answer).
So, unfortunately, the best I could come up with is parsing the jobs -l
output using GNU awk:
function last_job_pids {
if [[ -z "${1}" ]] ; then
return
fi
jobs -l | awk '
/^\[/ { delete pids; pids[$2]=$2; seen=1; next; }
// { if (seen) { pids[$1]=$1; } }
END { for (p in pids) print p; }'
}