Regular expression in Python won't match end of a string

前端未结

关注

 5  585

I\'m just learning Python, and I can\'t seem to figure out regular expressions.

r1 = re.compile(\"$.pdf\")
if r1.match(\"spam.pdf\"):
    print \'yes\'
else:


                      
              相关标签:


      
      
        
          5条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  说谎        
                
              
                            
                2021-02-13 13:29
              
            
            
                                                                       
You've tried all the variations except the one that works. The $ goes at the end of the pattern.  Also, you'll want to escape the period so it actually matches a period (usually it matches any character).

r1 = re.compile(r"\.pdf$")


However, an easier and clearer way to do this is using the string's .endswith() method:

if filename.endswith(".pdf"):
    # do something


That way you don't have to decipher the regular expression to understand what's going on.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  醉话见心        
                
              
                            
                2021-02-13 13:30
              
            
            
                                                                       
The regular expression $.pdf says "find the end of the string, then find any character and beyond the any character beyond the end of the string, find a p, a d and an f".

As written, it cannot sensibly match anything.

However, pdf$ would match.

In this specific case, you probably also want to do a search rather than match, as I believe match is inherently anchored at the start of the string.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  -上瘾入骨i        
                
              
                            
                2021-02-13 13:32
              
            
            
                                                                       
Behaviour of re.match() and re.search()

There is one significant difference: re.match() checks the beginning of string, you are most likely looking for re.search().

Comparison of both methods is clearly shown in the Python documentation chapter called "search() vs. match()"

Special characters in regular expression

Also the meaning of characters in regular expressions is different than you are trying to use it (see Regular Expression Syntax for details):


^ matches the beginning:


  (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.

$ matches the end:


  Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both ‘foo’ and ‘foobar’, while the regular expression foo$ matches only ‘foo’. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches ‘foo2’ normally, but ‘foo1’ in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.



Complete answer

The solution you are looking for may be:

import re
r1 = re.compile("\.pdf$")  # regular expression corrected
if r1.search("spam.pdf"):  # re.match() replaced with re.search()
    print "yes"
else:
    print "no"


which checks, if the string ends with ".pdf". Does the same as kindall's answer with .endswith(), but if kindall's answer works for you, choose it (it is cleaner as you may not need regular expressions at all).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  面向向阳花        
                
              
                            
                2021-02-13 13:43
              
            
            
                                                                       
I see 2 quick alternatives:


re.match(pattern='.*pdf$', string='filename.pdf')

Using this solution we must specify that we don't care about how the string begins. But we cannot omit the expression at the beginning.
When using re.match() you must be sure to provide a regex valid for the whole string i.e. since index 0 see https://docs.python.org/3/howto/regex.html#match-versus-search
re.search(pattern='\.pdf$', string='filename.pdf')

We don't care about how the string begins, we're just searching a string which ends with the extension


Answered have been already accepted but I've personnaly needed to check the official documentation to be clear with that.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  温柔的废话        
                
              
                            
                2021-02-13 13:47
              
            
            
                                                                       
Your Question

$ means "end of string". So, you need a regex like \.pdf$ to match:


A dot (.), escaped because it is a special character in regular expressions.
String "pdf"
End of string.


Further Reading

Regular expressions go beyond languages, Python or others, so you should read some tutorials about them firstly. Consider regular-expressions.info. This is not a Python question actually, it is a fundamental regular expression question.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复