gnu-parallel | 易学教程

Perl command inside GNU parallel?

阅读更多关于 Perl command inside GNU parallel?

I am trying to run this in parallel: parallel perl -pe '!/^step/ && s/(\S+)/sprintf("%.2e", $1)/ge' {} > {}.fix ::: * That is, I want to execute the perl command on all files in the current directory, in parallel. This is not working, but I have no idea why. Comment: The perl command is fixing floating-point numbers in tables. See Replacing precision of floating point numbers in existing file . In Bash you can make a function: doit() { perl -pe '!/^step/ && s/(\S+)/sprintf("%.2e", $1)/ge' "$1" > "$2" } export -f doit parallel doit {} {}.fix ::: * Exporting functions in Zsh requires using a

Parallel Cosine similarity of two large files with each other

阅读更多关于 Parallel Cosine similarity of two large files with each other

I have two files: A and B A has 400,000 lines each having 50 float values B has 40,000 lines having 50 float values. For every line in B, I need to find corresponding lines in A which have >90% similarity (cosine). For linear search and computation, the code takes ginormous computing time. (40-50 hours) Reaching out to the community for suggestions on how to fasten the process (link of blogs/resources such as AWS/Cloud to be used to achieve it). Have been stuck with this for quite a while! [There were mentions of rpud/rpudplus to do it, but can't seem to perform them on cloud resources] N.B.

Run a specifiable number of commands in parallel - contrasting xargs -P, GNU parallel, and “moreutils” parallel

阅读更多关于 Run a specifiable number of commands in parallel - contrasting xargs -P, GNU parallel, and “moreutils” parallel

问题 I'm trying to run multiple mongodump's on 26 servers in a bash script. I can run 3 commands like mongodump -h staging .... & mongodump -h production .... & mongodump -h web ... & at the same time, and when one finishes I want to start another mongodump. I can't run all 26 mongodumps commands at the same time, the server will run out on CPU. Max 3 mongodumps at the same time. 回答1: You can use xarg 's -P option to run a specifiable number of invocations in parallel : Note that the -P option is

GNU parallel with rsync

阅读更多关于 GNU parallel with rsync

I'm trying to run some instances of rsync in parallel using ssh with GNU parallel . The command I'm running is like this: find /tmp/tempfolder -type f -name 'chunck.*' | sort | parallel --gnu -j 4 -v ssh -i access.pem user@server echo {}\; rsync -Havessh -auz -0 --files-from={} ./ user@server:/destination/path /tmp/tempfolder contains files with the prefix chunck and they contain the actual file lists. With this command, I got the 4 calls for rsync alright, but they take a while to start running and don't start all together and don't run in parallel. What am I doing wrong? Are you sure the

Run a specifiable number of commands in parallel - contrasting xargs -P, GNU parallel, and “moreutils” parallel

阅读更多关于 Run a specifiable number of commands in parallel - contrasting xargs -P, GNU parallel, and “moreutils” parallel

I'm trying to run multiple mongodump's on 26 servers in a bash script. I can run 3 commands like mongodump -h staging .... & mongodump -h production .... & mongodump -h web ... & at the same time, and when one finishes I want to start another mongodump. I can't run all 26 mongodumps commands at the same time, the server will run out on CPU. Max 3 mongodumps at the same time. mklement0 You can use xarg 's -P option to run a specifiable number of invocations in parallel : Note that the -P option is not mandated by POSIX , but both GNU xargs and BSD/macOS xargs support it. xargs -P 3 -n 1

Parallelizing a while loop with arrays read from a file in bash

阅读更多关于 Parallelizing a while loop with arrays read from a file in bash

问题 I have a while loop in Bash handled like this: while IFS=$'\t' read -r -a line; do myprogram ${line[0]} ${line[1]} ${line[0]}_vs_${line[1]}.result; done < fileinput It reads from a file with this structure, for reference: foo bar baz foobar and so on (tab-delimited). I would like to parallelize this loop (since the entries are a lot and processing can be slow) using GNU parallel, however the examples are not clear on how I would assign each line to the array, like I do here. What would be a

Uploading files to s3 using s3cmd in parallel

阅读更多关于 Uploading files to s3 using s3cmd in parallel

问题 I've got a whole heap of files on a server, and I want to upload these onto S3. The files are stored with a .data extension, but really they're just a bunch of jpegs,pngs,zips or pdfs. I've already written a short script which finds the mime type and uploads them onto S3 and that works but it's slow. Is there any way to make the below run using gnu parallel? #!/bin/bash for n in $(find -name "*.data") do data=".data" extension=`file $n | cut -d ' ' -f2 | awk '{print tolower($0)}'` mimetype=

Splitting command line args with GNU parallel

阅读更多关于 Splitting command line args with GNU parallel

Using GNU parallel : http://www.gnu.org/software/parallel/ I have a program that takes two arguments, e.g. $ ./prog file1 file2 $ ./prog file2 file3 ... $ ./prog file23456 file23457 I'm using a script that generates the file name pairs, however this poses a problem because the result of the script is a single string - not a pair. like: $ ./prog "file1 file2" GNU parallel seems to have a slew of tricks up its sleeves, I wonder if there's one for splitting text around separators: $ generate_file_pairs | parallel ./prog ? # where ? is text under consideration, like "file1 file2" The easy work

Parallel Iterating IP Addresses in Bash

阅读更多关于 Parallel Iterating IP Addresses in Bash

问题 I'm dealing with a large private /8 network and need to enumerate all webservers which are listening on port 443 and have a specific version stated in their HTTP HEADER response. First I was thinking to run nmap with connect scans and grep myself through the output files, but this turned out to throw many false-positives where nmap stated a port to be "filtered" while it actually was "open" (used connect scans: nmap -sT -sV -Pn -n -oA foo 10.0.0.0/8 -p 443 ). So now I was thinking to script

Parallel Iterating IP Addresses in Bash

阅读更多关于 Parallel Iterating IP Addresses in Bash

I'm dealing with a large private /8 network and need to enumerate all webservers which are listening on port 443 and have a specific version stated in their HTTP HEADER response. First I was thinking to run nmap with connect scans and grep myself through the output files, but this turned out to throw many false-positives where nmap stated a port to be "filtered" while it actually was "open" (used connect scans: nmap -sT -sV -Pn -n -oA foo 10.0.0.0/8 -p 443 ). So now I was thinking to script something with bash and curl - pseudo code would be like: for each IP in 10.0.0.0/8 do: curl --head