Writing from multiple processes launched via xargs to the same fifo pipe causes lines to miss

别等时光非礼了梦想. 提交于 2020-12-15 07:14:38

问题


I have a script where I parallelize job execution while monitoring the progress. I do this using xargs and a named fifo pipe. My problem is that I while xargs performs well, some lines written to the pipe are lost. Any idea what the problem is?

For example the following script (basically my script with dummy data) will produce the following output and hangs at the end waiting for those missing lines:

$ bash test2.sh 
Progress: 0 of 99
DEBUG: Processed data 0 in separate process
Progress: 1 of 99
DEBUG: Processed data 1 in separate process
Progress: 2 of 99
DEBUG: Processed data 2 in separate process
Progress: 3 of 99
DEBUG: Processed data 3 in separate process
Progress: 4 of 99
DEBUG: Processed data 4 in separate process
Progress: 5 of 99
DEBUG: Processed data 5 in separate process
DEBUG: Processed data 6 in separate process
DEBUG: Processed data 7 in separate process
DEBUG: Processed data 8 in separate process
Progress: 6 of 99
DEBUG: Processed data 9 in separate process
Progress: 7 of 99
##### Script is hanging here (Could happen for any line) #####
#!/bin/bash
clear

printStateInLoop() {
  local pipe="$1"
  local total="$2"
  local finished=0

  echo "Progress: $finished of $total"
  while true; do
    if [ $finished -ge $total ]; then
      break
    fi

    let finished++
    read line <"$pipe"
      # In final script I would need to do more than just logging
    echo "Progress: $finished of $total"
  done
}

processData() {
  local number=$1
  local pipe=$2

  sleep 1 # Work needs time
  echo "$number" >"$pipe"
  echo "DEBUG: Processed data $number in separate process"
}
export -f processData

process() {
  TMP_DIR=$(mktemp -d)
  PROGRESS_PIPE="$TMP_DIR/progress-pipe"
  mkfifo "$PROGRESS_PIPE"

  DATA_VECTOR=($(seq 0 1 99)) # A bunch of data
  printf '%s\0' "${DATA_VECTOR[@]}" | xargs -0 --max-args=1 --max-procs=5 -I {} bash -c "processData \$@ \"$PROGRESS_PIPE\"" _ {} &

  printStateInLoop "$PROGRESS_PIPE" ${#DATA_VECTOR[@]}
}

process
rm -Rf "$TMP_DIR"

In another post I got the suggestion to switch to while read line; do … done < "$pipe" (function below) instead of while true; do … read line < "$pipe" … done to not close the pipeline on every line read. This reduces the frequency of the problem but still it happens: Some Lines are missing and sometimes a xargs: bash: terminated by signal 13.

printStateInLoop() {
  local pipe="$1"
  local total="$2"
  local finished=0

  echo "Progress: $finished of $total"
  while [ $finished -lt $total ]; do
    while read line; do
      let finished++
      # In final script I would need to do more than just logging
      echo "Progress: $finished of $total"
    done <"$pipe"
  done
}

A lot of people on SO suggested to use parallel or pv for doing this. Sadly those tools aren't available on the very limited target platform. Instead my script is based on xargs.


回答1:


The solution (as pointed out by @markp-fuso and @Dale) was to create a file lock.

Instead of:

echo "$number" >"$pipe"

I now use flock to create/wait for a lock first:

flock "$pipe.lock" echo "$number" >"$pipe"


来源:https://stackoverflow.com/questions/64743111/writing-from-multiple-processes-launched-via-xargs-to-the-same-fifo-pipe-causes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!