Faster grep function for big (27GB) files

后端 未结 4 427
闹比i
闹比i 2021-02-01 10:05

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB). To speed up the analysis I split the 27GB file int

4条回答
  •  挽巷
    挽巷 (楼主)
    2021-02-01 10:42

    Using GNU Parallel it would look like this:

    awk '{print $1}' input.sam > idsFile.txt
    doit() {
       LC_ALL=C fgrep -f idsFile.txt sample_"$1" | awk '{print $1,$10,$11}'
    }
    export -f doit
    parallel doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt
    

    If the order of the lines is not important this will be a bit faster:

    parallel --line-buffer doit {1}{2}{3} ::: {a..z} ::: {a..z} ::: {a..z} > output.txt
    

提交回复
热议问题