gnu-parallel | 易学教程

GNU Parallel as job queue processor

阅读更多关于 GNU Parallel as job queue processor

问题 I have a worker.php file as below <?php $data = $argv[1]; //then some time consuming $data processing and I run this as a poor man's job queue using gnu parallel while read LINE; do echo $LINE; done < very_big_file_10GB.txt | parallel -u php worker.php which kind of works by forking 4 php processes when I am on 4 cpu machine. But it still feels pretty synchronous to me because read LINE is still reading one line at a time. Since it is 10GB file, I am wondering if somehow I can use parallel to

GNU parallel colsep with missing columns

阅读更多关于 GNU parallel colsep with missing columns

问题 I have a program that takes a variable number of arguments and I want to run the program in parallel with one instance for each line of an input file. The input file is comma separated with some missing columns at the end of some rows. How can I instruct GNU parallel to skip the parameter substitution when the column is missing? Input File A,B,C,D,E A,B,C,D A,B,C Script parallel -a $1 --trim lr --colsep ',' echo {1} {2} {3} {4} {5} Output A B C D E A B C D {5} A B C {4} {5} Desired Output A B

GNU parallel double-dashed options not working

阅读更多关于 GNU parallel double-dashed options not working

问题 I am trying to run the following very simple parallel script: parallel --eta -j 1 -- "echo hi" but I get an error parallel: invalid option -- '-' parallel [OPTIONS] command -- arguments for each argument, run command with argument, in parallel parallel [OPTIONS] -- commands run specified commands in parallel This happens for every double-dashed option I try to use 回答1: You are using Tollef's parallel from Moreutils, and not GNU Parallel. 回答2: If you are not using the Tollef's parallel then

Parallel execution of Unix command?

阅读更多关于 Parallel execution of Unix command?

问题 I wrote one shell program which divide the files in 4 parts automatically using csplit and then four shell program which execute same command in background using nohup and one while loop will look for the completion of these four processes and finally cat output1.txt ....output4.txt > finaloutput.txt But then i came to know about this command parallel and i tried this with big file but looks like it is not working as expected. This file is an output of below command - for i in $(seq 1 1000000

How to take substring from input file as an argument to a program to be executed in GNU-parallel?

阅读更多关于 How to take substring from input file as an argument to a program to be executed in GNU-parallel?

问题 I am trying to execute a program (say, biotool ) using GNU-parallel which takes 3 arguments, i , o and a : the input files ( i ) output file name to be written in ( o ) an argument which takes a sub string from the input file name ( a ) for example, say i have 10 text files like this 1_a_test.txt 2_b_test.txt 3_c_test.txt ... 10_j_test.txt I want to run my tool (say biotool) on all the 10 text files. I tried this parallel biotool -i {} -o {.}.out -a {} ::: *.txt I want to pass the charachter

How to install GNU parallel (noarc.rpm) on CentOS 7

阅读更多关于 How to install GNU parallel (noarc.rpm) on CentOS 7

问题 I want to install GNU parrallel on Centos 7 There is not much info to find. Can someone explain me how to do this? This is some useful info I found 回答1: The 10 seconds installation is: $ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || \ fetch -o - http://pi.dk/3 ) > install.sh $ sha1sum install.sh | grep 3374ec53bacb199b245af2dda86df6c9 12345678 3374ec53 bacb199b 245af2dd a86df6c9 $ md5sum install.sh | grep 029a9ac06e8b5bc6052eac57b2c3c9ca 029a9ac0 6e8b5bc6 052eac57 b2c3c9ca $

Ubuntu terminal - using gnu parallel to read lines in all files in folder

阅读更多关于 Ubuntu terminal - using gnu parallel to read lines in all files in folder

问题 I am Trying to count the lines in all the files in a very large folder under Ubuntu. The files are .gz files and I use zcat * | wc -l to count all the lines in all the files, and it's slow! I want to use multi core computing for this task and found this about Gnu parallel, I tried to use this bash command: parallel zcat * | parallel --pipe wc -l and the cores are not all working I found that the job starting might cause major overhead and tried using batching with parallel -X zcat * |

bash loop in parallel

阅读更多关于 bash loop in parallel

问题 I am trying to run this script in parallel, for i<=4 in each set. The runspr.py is itself parallel, and thats fine. What I am trying to do is running only 4 i loop in any instance. In my present code, it will run everything. #!bin/bash for i in * do if [[ -d $i ]]; then echo "$i id dir" cd $i python3 ~/bin/runspr.py SCF & cd .. else echo "$i nont dir" fi done I have followed https://www.biostars.org/p/63816/ and https://unix.stackexchange.com/questions/35416/four-tasks-in-parallel-how-do-i-do

Parallel Cosine similarity of two large files with each other

阅读更多关于 Parallel Cosine similarity of two large files with each other

问题 I have two files: A and B A has 400,000 lines each having 50 float values B has 40,000 lines having 50 float values. For every line in B, I need to find corresponding lines in A which have >90% similarity (cosine). For linear search and computation, the code takes ginormous computing time. (40-50 hours) Reaching out to the community for suggestions on how to fasten the process (link of blogs/resources such as AWS/Cloud to be used to achieve it). Have been stuck with this for quite a while!

Is it possible to parallelize awk writing to multiple files through GNU parallel?

阅读更多关于 Is it possible to parallelize awk writing to multiple files through GNU parallel?

问题 I am running an awk script which I want to parallelize through GNU parallel. This script demultiplexes one input file to multiple output files depending on a value on each line. The code is the following: #!/usr/bin/awk -f BEGIN{ FS=OFS="\t" } { # bc is the field that defines to which file the line # will be written bc = $1 # append line to such file print >> (bc".txt") } I want to parallelize it using GNU parallel through the following: parallel --line-buffer --block 1G --pipe 'awk script