How to safely get the file extension from a URL?

后端 未结 9 1238
北荒
北荒 2021-02-02 08:40

Consider the following URLs

http://m3u.com/tunein.m3u
http://asxsomeurl.com/listen.asx:8024
http://www.plssomeotherurl.com/station.pls?id=111
http://22.198.133.16:802         


        
9条回答
  •  后悔当初
    2021-02-02 09:26

    A different approach that takes nothing else into account except for the actual file extension from a url:

    def fileExt( url ):
        # compile regular expressions
        reQuery = re.compile( r'\?.*$', re.IGNORECASE )
        rePort = re.compile( r':[0-9]+', re.IGNORECASE )
        reExt = re.compile( r'(\.[A-Za-z0-9]+$)', re.IGNORECASE )
    
        # remove query string
        url = reQuery.sub( "", url )
    
        # remove port
        url = rePort.sub( "", url )
    
        # extract extension
        matches = reExt.search( url )
        if None != matches:
            return matches.group( 1 )
        return None
    

    edit: added handling of explicit ports from :1234

提交回复
热议问题