I have the following Python code, but its running a bit slow on a 10mb file. I\'m wondering is there any way to speed it up? Maybe by doing the re.sub all in one go (rather than
If the affected lines are rare you can speed up a lot by using re.sub
or re.finditer
for finding those lines directly instead of iterating over the lines at Python level. And str.replace
is fast in case of simple string replacements:
def fsub(m):
return m.group().replace('ij', 'xx').replace('kl', 'yy')
s = re.sub('(?m)^.*(?:AAA|BBB|CCC).*', fsub, open(path).read())
Note: (?m)
causes the ^
to match the beginning of each line and .*
to not grab beyond line end.
REGEX pre-compilation can speed up many individual REGEX re.sub's (when simple string replacements are not applicable) a little:
rec = re.compile(r'ij\d+') # once
...
line = rec.sub('xx', line) # often
(re.sub
however uses already a REGEX compile cache which is quite fast yet.)
If the replacements do not change the string size, you can speed up things a lot by using bytearray
/ buffers or even mmap
and modify the data in-place. (re.sub()
and string.replace
and endstring += line
cause that a lot of memory is copyied around.)