Efficient substring search in a large text file containing 100 millions strings(no duplicate string)

后端未结

关注

 4  1727

有刺的猬 2021-02-10 07:50

I have a large text file(1.5 Gb) having 100 millions Strings(no duplicate String) and all the Strings are arranged line by line in the file . i want to make a wepapplication in

4条回答

一生所求 (楼主)

2021-02-10 08:06
You could build a directory structure based on the first few letters of each word. For example:
```
/A
/A/AA
/A/AB
/A/AC
...
/Z/ZU
```
Under that structure, you can keep a file containing all the strings with the first characters matching the folder name. The first characters in your search term will narrow the selection down to a folder with a small fraction of your overall list. From there, you do can do a full search of just that file. If it's too slow, increase the depth of your directory tree to cover more letters.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...