I was wondering what is the best way to convert something like \"haaaaapppppyyy\" to \"haappyy\".
Basically, when parsing slang, people sometimes repeat characters
You can squash multiple occurrences of letters with itertools.groupby
:
>>> ''.join(c for c, _ in groupby("haaaaapppppyyy"))
'hapy'
Similarly, you can get haappyy
from groupby
with
>>> ''.join(''.join(s)[:2] for _, s in groupby("haaaaapppppyyy"))
'haappyy'
It can be done using regular expressions:
>>> import re
>>> re.sub(r'(.)\1+', r'\1\1', "haaaaapppppyyy")
'haappyy'
(.)\1+
repleaces any character (.
) followed by one or more of the same character (because of the backref \1
it must be the same) by twice the character.
This is one way of doing it (limited to the obvious constraint that python doesn't speak english).
>>> s="haaaappppyy"
>>> reduce(lambda x,y: x+y if x[-2:]!=y*2 else x, s, "")
'haappyy'
You should do it without reduce or regexps:
>>> s = 'hhaaaaapppppyyy'
>>> ''.join(['' if i>1 and e==s[i-2] else e for i,e in enumerate(s)])
'haappyy'
The number of repetitions are hardcoded to >1
and -2
above. The general case:
>>> reps = 1
>>> ''.join(['' if i>reps-1 and e==s[i-reps] else e for i,e in enumerate(s)])
'hapy'