Consider the following URLs
http://m3u.com/tunein.m3u http://asxsomeurl.com/listen.asx:8024 http://www.plssomeotherurl.com/station.pls?id=111 http://22.198.133.16:802
The real proper way is to not use file extensions at all. Do a GET (or HEAD) request to the URL in question, and use the returned "Content-type" HTTP header to get the content type. File extensions are unreliable.
See MIME types (IANA media types) for more information and a list of useful MIME types.
This is easiest with requests
and mimetypes
:
import requests
import mimetypes
response = requests.get(url)
content_type = response.headers['content-type']
extension = mimetypes.guess_extension(content_type)
The extension includes a dot prefix. For example, extension
is '.png'
for content type 'image/png'
.
A different approach that takes nothing else into account except for the actual file extension from a url:
def fileExt( url ):
# compile regular expressions
reQuery = re.compile( r'\?.*$', re.IGNORECASE )
rePort = re.compile( r':[0-9]+', re.IGNORECASE )
reExt = re.compile( r'(\.[A-Za-z0-9]+$)', re.IGNORECASE )
# remove query string
url = reQuery.sub( "", url )
# remove port
url = rePort.sub( "", url )
# extract extension
matches = reExt.search( url )
if None != matches:
return matches.group( 1 )
return None
edit: added handling of explicit ports from :1234