Search Large Text File for Thousands of strings

前端 未结 3 486
无人共我
无人共我 2021-01-15 03:31

I have a large text file that is 20 GB in size. The file contains lines of text that are relatively short (40 to 60 characters per line). The file is unsorted.

I hav

3条回答
  •  滥情空心
    2021-01-15 04:17

    Rather than searching 20,000 times for each string separately, you can try to tokenize the input and do lookup in your std::set with strings to be found, it will be much faster. This is assuming your strings are simple identifiers, but something similar can be implemented for strings being sentences. In this case you would keep a set of first words in each sentence and after successful match verify that it's really beginning of the whole sentence with string::find.

提交回复
热议问题