Efficient substring search in a large text file containing 100 millions strings(no duplicate string)

后端未结

关注

 4  1746

有刺的猬 2021-02-10 07:50

I have a large text file(1.5 Gb) having 100 millions Strings(no duplicate String) and all the Strings are arranged line by line in the file . i want to make a wepapplication in

4条回答

旧时难觅i (楼主)

2021-02-10 08:03

Try to use hash tables. One more thing that can be done is any method similar to MAP-REDUCE. What i want to say is that you can try to use inverted index. Google uses the same technique. All you can create a file of stopwords where you can put words that can be ignored e.g. I, am, the, a, an, in, on etc.

this is the only thing which i suppose is possible. I read somewhere that for searching, u can arrays.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...