Faster grep function for big (27GB) files

后端未结

关注

 4  432

闹比i 2021-02-01 10:05

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB). To speed up the analysis I split the 27GB file int

4条回答

挽巷 (楼主)

2021-02-01 10:42

Using GNU Parallel it would look like this:

awk '{print $1}' input.sam > idsFile.txt
doit() {
   LC_ALL=C fgrep -f idsFile.txt sample_"$1" | awk '{print $1,$10,$11}'
}
export -f doit
parallel doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt

If the order of the lines is not important this will be a bit faster:

parallel --line-buffer doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt

0 讨论(0)

查看其它4个回答