How to check the url is either web page link or file link in python

前端 未结 2 1156
轻奢々
轻奢々 2021-01-06 09:57

Suppose i have links as follows:

    http://example.com/index.html
    http://example.com/stack.zip
    http://example.com/setup.exe
    http://example.com/n         


        
相关标签:
2条回答
  • 2021-01-06 10:10
    import urllib
    mytest = urllib.urlopen('http://www.sec.gov')
    mytest.headers.items()
    
    ('content-length', '20833'), ('expires', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('server', 'SEC'), ('connection', 'close'), ('cache-control', 'max-age=0'), ('date', 'Sun, 02 Feb 2014 19:36:12 GMT'), ('content-type', 'text/html')]
    

    mytest.headers.items() is a list of tuples, you can see in my example that the last item in the list describes the content

    I am not sure if the length varies so you could iterate through it to find the one that has 'content-type' in it.

    0 讨论(0)
  • 2021-01-06 10:23
    import urllib
    import mimetypes
    
    
    def guess_type_of(link, strict=True):
        link_type, _ = mimetypes.guess_type(link)
        if link_type is None and strict:
            u = urllib.urlopen(link)
            link_type = u.headers.gettype() # or using: u.info().gettype()
        return link_type
    

    Demo:

    links = ['http://stackoverflow.com/q/21515098/538284', # It's a html page
             'http://upload.wikimedia.org/wikipedia/meta/6/6d/Wikipedia_wordmark_1x.png', # It's a png file
             'http://commons.wikimedia.org/wiki/File:Typing_example.ogv', # It's a html page
             'http://upload.wikimedia.org/wikipedia/commons/e/e6/Typing_example.ogv'   # It's an ogv file
    ]
    
    for link in links:
        print(guess_type_of(link))
    

    Output:

    text/html
    image/x-png
    text/html
    application/ogg
    
    0 讨论(0)
提交回复
热议问题