Speeding up re.sub in python

前端未结

关注

 1  810

I have the following Python code, but its running a bit slow on a 10mb file. I\'m wondering is there any way to speed it up? Maybe by doing the re.sub all in one go (rather than

相关标签:

1条回答

滥情空心

2021-01-29 07:50
If the affected lines are rare you can speed up a lot by using re.sub or re.finditer for finding those lines directly instead of iterating over the lines at Python level. And str.replace is fast in case of simple string replacements:
```
def fsub(m):
    return m.group().replace('ij', 'xx').replace('kl', 'yy')

s = re.sub('(?m)^.*(?:AAA|BBB|CCC).*', fsub, open(path).read())
```
Note: (?m) causes the ^ to match the beginning of each line and .* to not grab beyond line end.

REGEX pre-compilation can speed up many individual REGEX re.sub's (when simple string replacements are not applicable) a little:
```
rec = re.compile(r'ij\d+') # once
...
line = rec.sub('xx', line)  # often
```
(re.sub however uses already a REGEX compile cache which is quite fast yet.)

If the replacements do not change the string size, you can speed up things a lot by using bytearray / buffers or even mmap and modify the data in-place. (re.sub() and string.replace and endstring += line cause that a lot of memory is copyied around.)
0 讨论(0)
发布评论:

提交评论
- 加载中...