Faster grep function for big (27GB) files

后端 未结 4 429
闹比i
闹比i 2021-02-01 10:05

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB). To speed up the analysis I split the 27GB file int

4条回答
  •  爱一瞬间的悲伤
    2021-02-01 10:36

    ok I have a test file containing 4 character strings ie aaaa aaab aaac etc

    ls -lh test.txt
    -rw-r--r-- 1 root pete 1.9G Jan 30 11:55 test.txt
    time grep -e aaa -e bbb test.txt
    
    real    0m19.250s
    user    0m8.578s
    sys     0m1.254s
    
    
    time grep --mmap -e aaa -e bbb test.txt
    
    real    0m18.087s
    user    0m8.709s
    sys     0m1.198s
    

    So using the mmap option shows a clear improvement on a 2 GB file with two search patterns, if you take @BrianAgnew's advice and use a single invocation of grep try the --mmap option.

    Though it should be noted that mmap can be a bit quirky if the source files changes during the search. from man grep

    --mmap

    If possible, use the mmap(2) system call to read input, instead of the default read(2) system call. In some situations, --mmap yields better performance. However, --mmap can cause undefined behavior (including core dumps) if an input file shrinks while grep is operating, or if an I/O error occurs.

提交回复
热议问题