Speed up millions of regex replacements in Python 3

后端 未结 9 1193
醉酒成梦
醉酒成梦 2020-11-22 05:44

I\'m using Python 3.5.2

I have two lists

  • a list of about 750,000 \"sentences\" (long strings)
  • a list of about 20,000 \"words\" that I would l
9条回答
  •  既然无缘
    2020-11-22 06:24

    One thing you can try is to compile one single pattern like "\b(word1|word2|word3)\b".

    Because re relies on C code to do the actual matching, the savings can be dramatic.

    As @pvg pointed out in the comments, it also benefits from single pass matching.

    If your words are not regex, Eric's answer is faster.

提交回复
热议问题