Grepping a huge file (80GB) any way to speed it up?

前端未结

关注

 5  1914

 grep -i -A 5 -B 5 \'db_pd.Clients\'  eightygigsfile.sql

This has been running for an hour on a fairly powerful linux server which is otherwise not

相关标签:

5条回答

野趣味

2020-11-29 15:52
Some trivial improvement:
- Remove the -i option, if you can, case insensitive is quite slow.
- Replace the . by \.
  
  A single point is the regex symbol to match any character, which is also slow
0 讨论(0)
发布评论:

提交评论
- 加载中...
天涯浪人

2020-11-29 15:55
If you have a multicore CPU, I would really recommend GNU parallel. To grep a big file in parallel use:
```
< eightygigsfile.sql parallel --pipe grep -i -C 5 'db_pd.Clients'
```
Depending on your disks and CPUs it may be faster to read larger blocks:
```
< eightygigsfile.sql parallel --pipe --block 10M grep -i -C 5 'db_pd.Clients'
```
It's not entirely clear from you question, but other options for grep include:
- Dropping the -i flag.
- Using the -F flag for a fixed string
- Disabling NLS with LANG=C
- Setting a max number of matches with the -m flag.
0 讨论(0)
发布评论:

提交评论
- 加载中...
耶瑟儿～

2020-11-29 16:02
```
< eightygigsfile.sql parallel -k -j120% -n10 -m grep -F -i -C 5 'db_pd.Clients'  
```
If you need to search for multiple strings, grep -f strings.txt saves a ton of time. The above is a translation of something that I am currently testing. the -j and -n option value seemed to work best for my use case. The -F grep also made a big difference.
0 讨论(0)
发布评论:

提交评论
- 加载中...
南方客

2020-11-29 16:15
Here are a few options:

1) Prefix your grep command with LC_ALL=C to use the C locale instead of UTF-8.

2) Use fgrep because you're searching for a fixed string, not a regular expression.

3) Remove the -i option, if you don't need it.

So your command becomes:
```
LC_ALL=C fgrep -A 5 -B 5 'db_pd.Clients' eightygigsfile.sql
```
It will also be faster if you copy your file to RAM disk.
0 讨论(0)
发布评论:

提交评论
- 加载中...
野趣味

2020-11-29 16:15
Two lines of attack:
- are you sure, you need the -i, or do you habe a possibility to get rid of it?
- Do you have more cores to play with? grep is single-threaded, so you might want to start more of them at different offsets.
0 讨论(0)
发布评论:

提交评论
- 加载中...