How to check the url is either web page link or file link in python

前端未结

关注

 2  1156

Suppose i have links as follows:

    http://example.com/index.html
    http://example.com/stack.zip
    http://example.com/setup.exe
    http://example.com/n


                      
              相关标签:


      
      
        
          2条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  难免孤独        
                
              
                            
                2021-01-06 10:10
              
            
            
                                                                       
import urllib
mytest = urllib.urlopen('http://www.sec.gov')
mytest.headers.items()

('content-length', '20833'), ('expires', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('server', 'SEC'), ('connection', 'close'), ('cache-control', 'max-age=0'), ('date', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('content-type', 'text/html')]


mytest.headers.items() is a list of tuples, you can see in my example that the last item in the list describes the content

I am not sure if the length varies so you could iterate through it to find the one that has
   'content-type' in it.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  广开言路        
                
              
                            
                2021-01-06 10:23
              
            
            
                                                                       
import urllib
import mimetypes


def guess_type_of(link, strict=True):
    link_type, _ = mimetypes.guess_type(link)
    if link_type is None and strict:
        u = urllib.urlopen(link)
        link_type = u.headers.gettype() # or using: u.info().gettype()
    return link_type


Demo:

links = ['http://stackoverflow.com/q/21515098/538284', # It's a html page
         'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It's a png file
         'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It's a html page
         'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv'   # It's an ogv file
]

for link in links:
    print(guess_type_of(link))


Output:

text/html
image/x-png
text/html
application/ogg

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复