Efficient substring search in a large text file containing 100 millions strings(no duplicate string)

后端 未结 4 1727
有刺的猬
有刺的猬 2021-02-10 07:50

I have a large text file(1.5 Gb) having 100 millions Strings(no duplicate String) and all the Strings are arranged line by line in the file . i want to make a wepapplication in

4条回答
  •  一生所求
    2021-02-10 08:06

    You could build a directory structure based on the first few letters of each word. For example:

    /A
    /A/AA
    /A/AB
    /A/AC
    ...
    /Z/ZU
    

    Under that structure, you can keep a file containing all the strings with the first characters matching the folder name. The first characters in your search term will narrow the selection down to a folder with a small fraction of your overall list. From there, you do can do a full search of just that file. If it's too slow, increase the depth of your directory tree to cover more letters.

提交回复
热议问题