I want to replace repeated instances of the \"*\"
character within a string with a single instance of \"*\"
. For example if the string is \"*
re.sub('\*+', '*', pattern)
That will do.
This will work for any number of consecutive asterisks, although you may need to replace the tilde with some other string that you know will be unique throughout the string.
string = "begin*************end"
string.replace("**", "~*").replace("*~", "").replace("~*", "*").replace("**", "*")
I believe regex approaches would be generally more computationally expensive than this.
The naive way to do this kind of thing with re
is
re.sub('\*+', '*', text)
That replaces runs of 1 or more asterisks with one asterisk. For runs of exactly one asterisk, that is running very hard just to stay still. Much better is to replace runs of TWO or more asterisks by a single asterisk:
re.sub('\*\*+', '*', text)
This can be well worth doing:
\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*+', '*', t)"
10000 loops, best of 3: 73.2 usec per loop
\python27\python -mtimeit -s"t='a*'*100;import re" "re.sub('\*\*+', '*', t)"
100000 loops, best of 3: 8.9 usec per loop
Note that re.sub will return a reference to the input string if it has found no matches, saving more wear and tear on your computer, instead of a whole new string.
how about a non regex way
def squeeze(char,s):
while char*2 in s:
s=s.replace(char*2,char)
return s
print(squeeze("*" , "AB***abc**def**AA***k"))
This returns AB*abc*def*AA*k
without regexp you can use general repeating element removal with checking of '*':
source = "***abc**dee*fg******h"
target = ''.join(c for c,n in zip(source, source[1:]+' ') if c+n != '**')
print target