Python split url to find image name and extension

后端 未结 7 1974
逝去的感伤
逝去的感伤 2021-02-04 13:43

I am looking for a way to extract a filename and extension from a particular url using Python

lets say a URL looks as follows

picture_page = \"http://dis         


        
相关标签:
7条回答
  • 2021-02-04 13:53
    >>> import re
    >>> s = 'picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"'
    >>> re.findall(r'\/([a-zA-Z0-9_]*)\.[a-zA-Z]*\"$',s)[0]
    'da4ca3509a7b11e19e4a12313813ffc0_7'
    >>> re.findall(r'([a-zA-Z]*)\"$',s)[0]
    'jpg'
    
    0 讨论(0)
  • 2021-02-04 14:01
    try:
        # Python 3
        from urllib.parse import urlparse
    except ImportError:
        # Python 2
        from urlparse import urlparse
    from os.path import splitext, basename
    
    picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
    disassembled = urlparse(picture_page)
    filename, file_ext = splitext(basename(disassembled.path))
    

    Only downside with this is that your filename will contain a preceding / which you can always remove yourself.

    0 讨论(0)
  • 2021-02-04 14:06
    filename = picture_page.split('/')[-1].split('.')[0]
    file_ext = '.'+picture_page.split('.')[-1]
    
    0 讨论(0)
  • 2021-02-04 14:08
    # Here's your link:
    picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
    
    #Here's your filename and ext:
    filename, ext = (picture_page.split('/')[-1].split('.'))
    

    When you do picture_page.split('/'), it will return a list of strings from your url split by a /. If you know python list indexing well, you'd know that -1 will give you the last element or the first element from the end of the list. In your case, it will be the filename: da4ca3509a7b11e19e4a12313813ffc0_7.jpg

    Splitting that by delimeter ., you get two values: da4ca3509a7b11e19e4a12313813ffc0_7 and jpg, as expected, because they are separated by a period which you used as a delimeter in your split() call.

    Now, since the last split returns two values in the resulting list, you can tuplify it. Hence, basically, the result would be like:

    filename,ext = ('da4ca3509a7b11e19e4a12313813ffc0_7', 'jpg')

    0 讨论(0)
  • 2021-02-04 14:15

    os.path.splitext will help you extract the filename and extension once you have extracted the relevant string from the URL using urlparse:

       fName, ext = os.path.splitext('yourImage.jpg')
    
    0 讨论(0)
  • 2021-02-04 14:15

    This is the easiest way to find image name and extension using regular expression.

    import re
    import sys
    
    picture_page = "http://distilleryimage2.instagram.com/da4ca3509a7b11e19e4a12313813ffc0_7.jpg"
    
    regex = re.compile('(.*\/(?P<name>\w+)\.(?P<ext>\w+))')
    
    print  regex.search(picture_page).group('name')
    print  regex.search(picture_page).group('ext')
    
    0 讨论(0)
提交回复
热议问题