python regex for repeating string

后端未结

关注

 4  1498

I am wanting to verify and then parse this string (in quotes):

string = \"start: c12354, c3456, 34526; other stuff that I don\'t care about\"
//Note that som


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  渐次进展        
                
              
                            
                2021-01-07 04:59
              
            
            
                                                                       
This can be done (pretty elegantly) with a tool like Pyparsing:

from pyparsing import Group, Literal, Optional, Word
import string

code = Group(Optional(Literal("c"), default='') + Word(string.digits) + Optional(Literal(","), default=''))
parser = Literal("start:") + OneOrMore(code) + Literal(";")
# Read lines from file:
with open('lines.txt', 'r') as f:
    for line in f:
        try:
            result = parser.parseString(line)
            codes = [c[1] for c in result[1:-1]]
            # Do something with teh codez...
        except ParseException exc:
            # Oh noes: string doesn't match!
            continue


Cleaner than a regular expression, returns a list of codes (no need to string.split), and ignores any extra characters in the line, just like your example.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2021-01-07 05:00
              
            
            
                                                                       
In Python, this isn’t possible with a single regular expression: each capture of a group overrides the last capture of that same group (in .NET, this would actually be possible since the engine distinguishes between captures and groups).

Your easiest solution is to first extract the part between start: and ; and then using a regular expression to return all matches, not just a single match, using re.findall('c?[0-9]+', text).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  暗喜        
                
              
                            
                2021-01-07 05:08
              
            
            
                                                                       
import re

sstr = re.compile(r'start:([^;]*);')
slst = re.compile(r'(?:c?)(\d+)')

mystr = "start: c12354, c3456, 34526; other stuff that I don't care about"
match = re.match(sstr, mystr)
if match:
    res = re.findall(slst, match.group(0))


results in

['12354', '3456', '34526']

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  伪装坚强ぢ        
                
              
                            
                2021-01-07 05:23
              
            
            
                                                                       
You could use the standard string tools, which are pretty much always more readable.

s = "start: c12354, c3456, 34526;"

s.startswith("start:") # returns a boolean if it starts with this string 

s.endswith(";") # returns a boolean if it ends with this string

s[6:-1].split(', ') # will give you a list of tokens separated by the string ", "
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复