Speeding up re.sub in python

前端 未结 1 810
遇见更好的自我
遇见更好的自我 2021-01-29 07:23

I have the following Python code, but its running a bit slow on a 10mb file. I\'m wondering is there any way to speed it up? Maybe by doing the re.sub all in one go (rather than

相关标签:
1条回答
  • 2021-01-29 07:50

    If the affected lines are rare you can speed up a lot by using re.sub or re.finditer for finding those lines directly instead of iterating over the lines at Python level. And str.replace is fast in case of simple string replacements:

    def fsub(m):
        return m.group().replace('ij', 'xx').replace('kl', 'yy')
    
    s = re.sub('(?m)^.*(?:AAA|BBB|CCC).*', fsub, open(path).read())
    

    Note: (?m) causes the ^ to match the beginning of each line and .* to not grab beyond line end.

    REGEX pre-compilation can speed up many individual REGEX re.sub's (when simple string replacements are not applicable) a little:

    rec = re.compile(r'ij\d+') # once
    ...
    line = rec.sub('xx', line)  # often
    

    (re.sub however uses already a REGEX compile cache which is quite fast yet.)

    If the replacements do not change the string size, you can speed up things a lot by using bytearray / buffers or even mmap and modify the data in-place. (re.sub() and string.replace and endstring += line cause that a lot of memory is copyied around.)

    0 讨论(0)
提交回复
热议问题