Speed up millions of regex replacements in Python 3

后端 未结 9 1209
醉酒成梦
醉酒成梦 2020-11-22 05:44

I\'m using Python 3.5.2

I have two lists

  • a list of about 750,000 \"sentences\" (long strings)
  • a list of about 20,000 \"words\" that I would l
9条回答
  •  攒了一身酷
    2020-11-22 06:31

    One thing you might want to try is pre-processing the sentences to encode the word boundaries. Basically turn each sentence into a list of words by splitting on word boundaries.

    This should be faster, because to process a sentence, you just have to step through each of the words and check if it's a match.

    Currently the regex search is having to go through the entire string again each time, looking for word boundaries, and then "discarding" the result of this work before the next pass.

提交回复
热议问题