Regex with lookahead does not match in Python

前端未结

关注

 1  1736

I have composed a regex pattern aiming to capture one date and one number from a sentence. But it does not.

My code is:

txt = \'Την 02/12/2013 καταχωρήθηκ


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执笔经年        
                
              
                            
                2021-01-22 16:51
              
            
            
                                                                       
Issues:

\.+ matches one or more dots, you need to use .+ (no escaping)
(?=(κωδικ.\s?αριθμ.\s?καταχ.ριση.)|(κ\.?α\.?κ\.?:?\s*))(?P<KEK_number>\d+) will always prevent any match since the positive lookahead requires some text that is not 1 or more digits. You need to convert the lookahead to a consuming pattern.

I suggest fixing your pattern as
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)

See the regex demo
Details

Την\s? - Την string and an optional whitespace
(?P<KEK_date>\d{2}/\d{2}/\d{4}) - Group "KEK_date": a date pattern, 2 digits, /, 2 digits, / and 4 digits
.+ - 1 or more chars other than line break chars as many as possible
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?) - either of

κωδικ.\s?αριθμ.\s?καταχ.ριση. - κωδικ, any char, an optional whitespace, αριθμ, any one char, an optional whitespace, καταχ, any 1 char, ριση and any 1 char (but line break char)
| - or
κ\.?α\.κ\.:? - κ, an optional ., α, an optional ., κ a . and then an optional :


\s+ - 1+ whitespaces
(?P<KEK_number>\d+)  - Group "KEK_number": 1+ digits

See a Python demo:
import re
txt = 'Την 02/12/2013 καταχωρήθηκε στο Γενικό Εμπορικό Μητρώο της Υπηρεσίας Γ.Ε.ΜΗ. του Επιμελητηρίου Βοιωτίας, με κωδικόαριθμό καταχώρισης Κ.Α.Κ.: 110035'
p = re.compile(r'''Την\s? # matches Την with a possible space afterwards
(?P<KEK_date>\d{2}/\d{2}/\d{4}) #matches a date of the given format and captures it with a named group
.+ # Allow for an arbitrary sequence of characters 
(?:κωδικ.\s?αριθμ.\s?καταχ.ριση.|κ\.?α\.κ\.:?)\s+ # defines two lookaheads, either of which suffices
(?P<KEK_number>\d+) # captures a sequence of numbers''', re.I | re.X)
print(p.findall(txt)) # => [('02/12/2013', '110035')]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复