I\'m trying to determine the base of a URL, or everything besides the page and parameters. I tried using split, but is there a better way than splitting it up into pieces? Is th
The best way to do this is use urllib.parse.
From the docs:
The module has been designed to match the Internet RFC on Relative Uniform Resource Locators. It supports the following URL schemes:
file
,ftp
,gopher
,hdl
,http
,https
,imap
,mailto
,mms
,news
,nntp
,prospero
,rsync
,rtsp
,rtspu
,sftp
,shttp
,sip
,sips
,snews
,svn
,svn+ssh
,telnet
,wais
,ws
,wss
.
You'd want to do something like this using urlsplit and urlunsplit:
from urllib.parse import urlsplit, urlunsplit
split_url = urlsplit('http://127.0.0.1/asdf/login.php?q=abc#stackoverflow')
# You now have:
# split_url.scheme "http"
# split_url.netloc "127.0.0.1"
# split_url.path "/asdf/login.php"
# split_url.query "q=abc"
# split_url.fragment "stackoverflow"
# Use all the path except everything after the last '/'
clean_path = "".join(split_url.path.rpartition("/")[:-1])
# "/asdf/"
# urlunsplit joins a urlsplit tuple
clean_url = urlunsplit(split_url)
# "http://127.0.0.1/asdf/login.php?q=abc#stackoverflow"
# A more advanced example
advanced_split_url = urlsplit('http://foo:bar@127.0.0.1:5000/asdf/login.php?q=abc#stackoverflow')
# You now have *in addition* to the above:
# advanced_split_url.username "foo"
# advanced_split_url.password "bar"
# advanced_split_url.hostname "127.0.0.1"
# advanced_split_url.port "5000"