Python split url to find image name and extension

后端未结

关注

 7  1986

I am looking for a way to extract a filename and extension from a particular url using Python

lets say a URL looks as follows

picture_page = \"http://dis


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  孤独总比滥情好        
                
              
                            
                2021-02-04 13:53
              
            
            
                                                                       
>>> import re
>>> s = 'picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"'
>>> re.findall(r'\/([a-zA-Z0-9_]*)\.[a-zA-Z]*\"$',s)[0]
'da4ca3509a7b11e19e4a12313813ffc0_7'
>>> re.findall(r'([a-zA-Z]*)\"$',s)[0]
'jpg'

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-02-04 14:01
              
            
            
                                                                       
try:
    # Python 3
    from urllib.parse import urlparse
except ImportError:
    # Python 2
    from urlparse import urlparse
from os.path import splitext, basename

picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
disassembled = urlparse(picture_page)
filename, file_ext = splitext(basename(disassembled.path))

Only downside with this is that your filename will contain a preceding / which you can always remove yourself.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  再見小時候        
                
              
                            
                2021-02-04 14:06
              
            
            
                                                                       
filename = picture_page.split('/')[-1].split('.')[0]
file_ext = '.'+picture_page.split('.')[-1]

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  太阳男子        
                
              
                            
                2021-02-04 14:08
              
            
            
                                                                       
# Here's your link:
picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"

#Here's your filename and ext:
filename, ext = (picture_page.split('/')[-1].split('.'))


When you do picture_page.split('/'), it will return a list of strings from your url split by a /. 
If you know python list indexing well, you'd know that -1 will give you the last element or the first element from the end of the list.
In your case, it will be the filename: da4ca3509a7b11e19e4a12313813ffc0_7.jpg

Splitting that by delimeter ., you get two values:
da4ca3509a7b11e19e4a12313813ffc0_7 and jpg, as expected, because they are separated by a period which you used as a delimeter in your split() call.

Now, since the last split returns two values in the resulting list, you can tuplify it.
Hence, basically, the result would be like:

filename,ext = ('da4ca3509a7b11e19e4a12313813ffc0_7', 'jpg')
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  被撕碎了的回忆        
                
              
                            
                2021-02-04 14:15
              
            
            
                                                                       
os.path.splitext will help you extract the filename and extension once you have extracted the relevant string from the URL using urlparse:

   fName, ext = os.path.splitext('yourImage.jpg')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2021-02-04 14:15
              
            
            
                                                                       
This is the easiest way to find image name and extension using regular expression.

import re
import sys

picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"

regex = re.compile('(.*\/(?P<name>\w+)\.(?P<ext>\w+))')

print  regex.search(picture_page).group('name')
print  regex.search(picture_page).group('ext')

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复