How to replace repeated instances of a character with a single instance of that character in python

前端 未结 11 1280
北海茫月
北海茫月 2020-12-31 00:29

I want to replace repeated instances of the \"*\" character within a string with a single instance of \"*\". For example if the string is \"*

相关标签:
11条回答
  • 2020-12-31 00:30

    Well regular expressions wise I would do exactly as JoshD has suggested. But one improvement here.

    Use -

    regex  = re.compile('\*+')
    result = re.sub(regex, "*", string)
    

    This would essentially cache your regex. So subsequent usage of this in a loop would make your regex operations fast.

    0 讨论(0)
  • 2020-12-31 00:32

    I'd suggest using the re module sub function:

    import re
    
    result = re.sub("\*+", "*", "***abc**de*fg******h")
    

    I highly recommend reading through the article about RE and good practices. They can be tricky if you're not familiar with them. In practice, using raw strings is a good idea.

    0 讨论(0)
  • 2020-12-31 00:35

    Lets assume for this sake of this example, your character is a space.

    You can also do it this way:

    while True:
        if "  " in pattern: # if two spaces are in the variable pattern
            pattern = pattern.replace("  ", " ") # replace two spaces with one
        else: # otherwise
            break # break from the infinite while loop
    

    This:

    File Type                       : Win32 EXE
    File Type Extension             : exe
    MIME Type                       : application/octet-stream
    Machine Type                    : Intel 386 or later, and compatibles
    Time Stamp                      : 2017:04:24 09:55:04-04:00
    

    Becomes:

    File Type : Win32 EXE
    File Type Extension : exe
    MIME Type : application/octet-stream
    Machine Type : Intel 386 or later, and compatibles
    Time Stamp : 2017:04:24 09:55:04-04:00
    

    I find this is a little easier than having to muck around with the re module, which can get a little annoying sometimes (I think).

    Hope that was helpful.

    0 讨论(0)
  • 2020-12-31 00:38

    You wrote:

    pattern.replace("*"\*, "*")
    

    You meant:

    pattern.replace("\**", "*")
    #                ^^^^
    

    You really meant:

    pattern_after_substitution= re.sub(r"\*+", "*", pattern)
    

    which does what you wanted.

    0 讨论(0)
  • 2020-12-31 00:39

    I timed all the methods in the current answers (with Python 3.7.2, macOS High Sierra).

    b() was the best overall, c() was best when no matches are made.

    def b(text):
        re.sub(r"\*\*+", "*", text)
    
    # aka squeeze()
    def c(text):
        while "*" * 2 in text:
            text = text.replace("*" * 2, "*")
        return text
    

    Input 1, no repeats: 'a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*'

    • a) 10000 loops, best of 5: 24.5 usec per loop
    • b) 100000 loops, best of 5: 3.17 usec per loop
    • c) 500000 loops, best of 5: 508 nsec per loop
    • d) 10000 loops, best of 5: 25.4 usec per loop
    • e) 5000 loops, best of 5: 44.7 usec per loop
    • f) 500000 loops, best of 5: 522 nsec per loop

    Input 2, with repeats: 'a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*****************************************************************************************************a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*'

    • a) 5000 loops, best of 5: 46.2 usec per loop
    • b) 50000 loops, best of 5: 5.21 usec per loop
    • c) 20000 loops, best of 5: 13.4 usec per loop
    • d) 5000 loops, best of 5: 47.4 usec per loop
    • e) 2000 loops, best of 5: 103 usec per loop
    • f) 20000 loops, best of 5: 13.1 usec per loop

    The methods:

    #!/usr/bin/env python
    # encoding: utf-8
    """
    See which function variants are fastest. Run like:
    python -mtimeit -s"import time_functions;t='a*'*100" "time_functions.a(t)"
    python -mtimeit -s"import time_functions;t='a*'*100" "time_functions.b(t)"
    etc.
    """
    import re
    
    
    def a(text):
        return re.sub(r"\*+", "*", text)
    
    
    def b(text):
        re.sub(r"\*\*+", "*", text)
    
    
    # aka squeeze()
    def c(text):
        while "*" * 2 in text:
            text = text.replace("*" * 2, "*")
        return text
    
    
    regex = re.compile(r"\*+")
    
    
    # like a() but with (premature) optimisation
    def d(text):
        return re.sub(regex, "*", text)
    
    
    def e(text):
        return "".join(c for c, n in zip(text, text[1:] + " ") if c + n != "**")
    
    
    def f(text):
        while True:
            if "**" in text:  # if two stars are in the variable pattern
                text = text.replace("**", "*")  # replace two stars with one
            else:  # otherwise
                break  # break from the infinite while loop
        return text
    
    0 讨论(0)
  • 2020-12-31 00:41

    text = "aaaaaaaaaabbbbbbbbbbcccccccffffdffffdaaaaaa"

    result = " "

    for char in text:

    if len(result) > 0 and result[-1] == char:
        continue
    else:
        result += char
    

    print(result) # abcda

    0 讨论(0)
提交回复
热议问题