问题
Just recently started programming in bash and came across GNU Parallel, which is exactly, what I need for my project. Have a basic loop script, which is meant to loop through the list of ip's and ping each, one time. The list with the ip's is constantly updated with the new ones, driven by the other script.
For multithreading, I would like to use the GNU Parallel.
My idea was to run 10 Parallel instances, each will capture one ip from the list, insert it into the curl command and removes it from the list, so the other instances of won't pick it up.
#! /bin/bash
while true; do
while read -r ip; do
curl $ip >> result.txt
sed -i '1,1 d' iplist
done < ipslist
done
I'm not sure, what's the right way to run the bash script, in this case, every solution I could find, doesn't work properly and things get totally messy. I have a feeling, this all can be done with a single line, but, for my own reasons, I'd prefer to run it as bash script. Would be grateful for any help!
回答1:
This works for me:
#!/bin/bash
while true; do
parallel -j10 curl '{}' < ipslist >> result.txt
done
If that's not what you intended, please update your question to clarify.
回答2:
Thomas' solution looks like the correct for this particular situation. If, however, you need to do more than simply curl
then I will recommend making a function:
#! /bin/bash
doit() {
ip="$1"
curl "$ip"
echo do other stuff here
}
export -f doit
while true; do
parallel -j10 doit < ipslist >> result.txt
done
If you want to ipslist
to be a queue so you can later add stuff to the queue and you only want it curl
ed once:
tail -n+0 -f ipslist | parallel doit >> result.txt
Now you can later simply add stuff to ipslist and GNU Parallel will curl
that, too.
(There is a a small issue when using GNU parallel as queue system/batch manager: You have to submit JobSlot number of jobs before they will start, and after that you can submit one at a time, and job will start immediately if free slots are available. Output from the running or completed jobs are held back and will only be printed when JobSlots more jobs has been started (unless you use --ungroup or --line-buffer, in which case the output from the jobs are printed immediately). E.g. if you have 10 jobslots then the output from the first completed job will only be printed when job 11 has started, and the output of second completed job will only be printed when job 12 has started.)
回答3:
I would just use xargs
. Not many people seem to know this, but there's much more to it than the standard usage to just squeeze every line of the input on a single line. That is, this:
echo -e "A\nB\nC\nD\nE" | xargs do_something
would essentially mean the same as this:
do_something A B C D E
However you can specify, how many lines are processed in one chunk, using the -L
option:
echo -e "A\nB\nC\nD\nE" | xargs -L2 do_something
would translate to:
do_something A B
do_something C D
Additionally, you can also specify, how many of these chunks run in parallel, with the -P
option. So to process the lines one-by-one, with a parallelism of, say 3, you would say:
echo -e "A\nB\nC\nD\nE" | xargs -L1 -P3 do_something
Et voilà, you have proper parallel execution, with basic unix tools.
The only catch is, that you have to make sure you'll separate the outputs. I am not sure, whether this has been thought of before, but a solution for the curl
case is something like this:
cat url_list.txt | xargs -L1 -P10 curl -o paralell_#0.html
Where #0
will be replaced by cURL with the URL being fetched. See the manuals for further details:
- http://man7.org/linux/man-pages/man1/xargs.1.html
- https://curl.haxx.se/docs/manpage.html
回答4:
You can do this and it will work :
#! /bin/bash
while true; do
while read -r ip; do
curl $ip >> result.txt &
sed -i '1,1 d' iplist
done < ipslist
wait
done
来源:https://stackoverflow.com/questions/49360294/running-a-loop-bash-curl-script-with-gnu-parallel