Faster grep function for big (27GB) files

后端未结

关注

 4  429

闹比i 2021-02-01 10:05

I have to grep from a file (5MB) containing specific strings the same strings (and other information) from a big file (27GB). To speed up the analysis I split the 27GB file int

4条回答

爱一瞬间的悲伤 (楼主)

2021-02-01 10:36
ok I have a test file containing 4 character strings ie aaaa aaab aaac etc
```
ls -lh test.txt
-rw-r--r-- 1 root pete 1.9G Jan 30 11:55 test.txt
time grep -e aaa -e bbb test.txt

real    0m19.250s
user    0m8.578s
sys     0m1.254s


time grep --mmap -e aaa -e bbb test.txt

real    0m18.087s
user    0m8.709s
sys     0m1.198s
```
So using the mmap option shows a clear improvement on a 2 GB file with two search patterns, if you take @BrianAgnew's advice and use a single invocation of grep try the --mmap option.

Though it should be noted that mmap can be a bit quirky if the source files changes during the search. from man grep

--mmap

If possible, use the mmap(2) system call to read input, instead of the default read(2) system call. In some situations, --mmap yields better performance. However, --mmap can cause undefined behavior (including core dumps) if an input file shrinks while grep is operating, or if an I/O error occurs.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...