How to replace repeated instances of a character with a single instance of that character in python

前端未结

关注

 11  1293

I want to replace repeated instances of the \"*\" character within a string with a single instance of \"*\". For example if the string is \"*


                      
              相关标签:


      
      
        
          11条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  天命终不由人        
                
              
                            
                2020-12-31 00:30
              
            
            
                                                                       
Well regular expressions wise I would do exactly as JoshD has suggested. But one improvement here. 

Use - 

regex  = re.compile('\*+')
result = re.sub(regex, "*", string)


This would essentially cache your regex. So subsequent usage of this in a loop would make your regex operations fast. 
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2020-12-31 00:32
              
            
            
                                                                       
I'd suggest using the re module sub function:

import re

result = re.sub("\*+", "*", "***abc**de*fg******h")


I highly recommend reading through the article about RE and good practices. They can be tricky if you're not familiar with them. In practice, using raw strings is a good idea.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  长情又很酷        
                
              
                            
                2020-12-31 00:35
              
            
            
                                                                       
Lets assume for this sake of this example, your character is a space.

You can also do it this way:

while True:
    if "  " in pattern: # if two spaces are in the variable pattern
        pattern = pattern.replace("  ", " ") # replace two spaces with one
    else: # otherwise
        break # break from the infinite while loop


This: 

File Type                       : Win32 EXE
File Type Extension             : exe
MIME Type                       : application/octet-stream
Machine Type                    : Intel 386 or later, and compatibles
Time Stamp                      : 2017:04:24 09:55:04-04:00


Becomes:

File Type : Win32 EXE
File Type Extension : exe
MIME Type : application/octet-stream
Machine Type : Intel 386 or later, and compatibles
Time Stamp : 2017:04:24 09:55:04-04:00


I find this is a little easier than having to muck around with the re module, which can get a little annoying sometimes (I think).

Hope that was helpful.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2020-12-31 00:38
              
            
            
                                                                       
You wrote:

pattern.replace("*"\*, "*")


You meant:

pattern.replace("\**", "*")
#                ^^^^


You really meant:

pattern_after_substitution= re.sub(r"\*+", "*", pattern)


which does what you wanted.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光说笑        
                
              
                            
                2020-12-31 00:39
              
            
            
                                                                       
I timed all the methods in the current answers (with Python 3.7.2, macOS High Sierra).

b() was the best overall, c() was best when no matches are made.

def b(text):
    re.sub(r"\*\*+", "*", text)

# aka squeeze()
def c(text):
    while "*" * 2 in text:
        text = text.replace("*" * 2, "*")
    return text


Input 1, no repeats: 
'a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*'


a) 10000 loops, best of 5: 24.5 usec per loop
b) 100000 loops, best of 5: 3.17 usec per loop
c) 500000 loops, best of 5: 508 nsec per loop
d) 10000 loops, best of 5: 25.4 usec per loop
e) 5000 loops, best of 5: 44.7 usec per loop
f) 500000 loops, best of 5: 522 nsec per loop


Input 2, with repeats:
'a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*****************************************************************************************************a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*a*'


a) 5000 loops, best of 5: 46.2 usec per loop
b) 50000 loops, best of 5: 5.21 usec per loop
c) 20000 loops, best of 5: 13.4 usec per loop
d) 5000 loops, best of 5: 47.4 usec per loop
e) 2000 loops, best of 5: 103 usec per loop
f) 20000 loops, best of 5: 13.1 usec per loop




The methods:

#!/usr/bin/env python
# encoding: utf-8
"""
See which function variants are fastest. Run like:
python -mtimeit -s"import time_functions;t='a*'*100" "time_functions.a(t)"
python -mtimeit -s"import time_functions;t='a*'*100" "time_functions.b(t)"
etc.
"""
import re


def a(text):
    return re.sub(r"\*+", "*", text)


def b(text):
    re.sub(r"\*\*+", "*", text)


# aka squeeze()
def c(text):
    while "*" * 2 in text:
        text = text.replace("*" * 2, "*")
    return text


regex = re.compile(r"\*+")


# like a() but with (premature) optimisation
def d(text):
    return re.sub(regex, "*", text)


def e(text):
    return "".join(c for c, n in zip(text, text[1:] + " ") if c + n != "**")


def f(text):
    while True:
        if "**" in text:  # if two stars are in the variable pattern
            text = text.replace("**", "*")  # replace two stars with one
        else:  # otherwise
            break  # break from the infinite while loop
    return text

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  抹茶落季        
                
              
                            
                2020-12-31 00:41
              
            
            
                                                                       
text = "aaaaaaaaaabbbbbbbbbbcccccccffffdffffdaaaaaa"
result = " "
for char in text:
if len(result) > 0 and result[-1] == char:
    continue
else:
    result += char

print(result) # abcda
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复