I am looking for an efficient way to extract the shortest repeating substring. For example:
input1 = \'dabcdbcdbcdd\'
ouput1 = \'bcd\'
input2 = \'cbabababac
^
matches at the start of a string. In your example the repeating substrings don't start at the beginning. Similar for $
. Without ^
and $
the pattern .*?
always matches empty string. Demo:
import re
def srp(s):
return re.search(r'(.+?)\1+', s).group(1)
print srp('dabcdbcdbcdd') # -> bcd
print srp('cbabababac') # -> ba
Though It doesn't find the shortest substring.