Efficient substring search in a large text file containing 100 millions strings(no duplicate string)

后端 未结 4 1746
有刺的猬
有刺的猬 2021-02-10 07:50

I have a large text file(1.5 Gb) having 100 millions Strings(no duplicate String) and all the Strings are arranged line by line in the file . i want to make a wepapplication in

4条回答
  •  旧时难觅i
    2021-02-10 08:03

    Try to use hash tables. One more thing that can be done is any method similar to MAP-REDUCE. What i want to say is that you can try to use inverted index. Google uses the same technique. All you can create a file of stopwords where you can put words that can be ignored e.g. I, am, the, a, an, in, on etc.

    this is the only thing which i suppose is possible. I read somewhere that for searching, u can arrays.

提交回复
热议问题