Remove repeating characters from words

后端 未结 4 770
醉酒成梦
醉酒成梦 2020-12-09 20:23

I was wondering what is the best way to convert something like \"haaaaapppppyyy\" to \"haappyy\".

Basically, when parsing slang, people sometimes repeat characters

相关标签:
4条回答
  • 2020-12-09 20:53

    You can squash multiple occurrences of letters with itertools.groupby:

    >>> ''.join(c for c, _ in groupby("haaaaapppppyyy"))
    'hapy'
    

    Similarly, you can get haappyy from groupby with

    >>> ''.join(''.join(s)[:2] for _, s in groupby("haaaaapppppyyy"))
    'haappyy'
    
    0 讨论(0)
  • 2020-12-09 21:01

    It can be done using regular expressions:

    >>> import re
    >>> re.sub(r'(.)\1+', r'\1\1', "haaaaapppppyyy")     
    'haappyy'
    

    (.)\1+ repleaces any character (.) followed by one or more of the same character (because of the backref \1 it must be the same) by twice the character.

    0 讨论(0)
  • 2020-12-09 21:02

    This is one way of doing it (limited to the obvious constraint that python doesn't speak english).

    >>> s="haaaappppyy"
    >>> reduce(lambda x,y: x+y if x[-2:]!=y*2 else x, s, "")
    'haappyy'
    
    0 讨论(0)
  • 2020-12-09 21:10

    You should do it without reduce or regexps:

    >>> s = 'hhaaaaapppppyyy'
    >>> ''.join(['' if i>1 and e==s[i-2] else e for i,e in enumerate(s)])
    'haappyy'
    

    The number of repetitions are hardcoded to >1 and -2 above. The general case:

    >>> reps = 1
    >>> ''.join(['' if i>reps-1 and e==s[i-reps] else e for i,e in enumerate(s)])
    'hapy'
    
    0 讨论(0)
提交回复
热议问题