How can I get the base of a URL in Python?

前端 未结 8 2477
情书的邮戳
情书的邮戳 2021-02-12 12:34

I\'m trying to determine the base of a URL, or everything besides the page and parameters. I tried using split, but is there a better way than splitting it up into pieces? Is th

8条回答
  •  予麋鹿
    予麋鹿 (楼主)
    2021-02-12 13:19

    The best way to do this is use urllib.parse.

    From the docs:

    The module has been designed to match the Internet RFC on Relative Uniform Resource Locators. It supports the following URL schemes: file, ftp, gopher, hdl, http, https, imap, mailto, mms, news, nntp, prospero, rsync, rtsp, rtspu, sftp, shttp, sip, sips, snews, svn, svn+ssh, telnet, wais, ws, wss.

    You'd want to do something like this using urlsplit and urlunsplit:

    from urllib.parse import urlsplit, urlunsplit
    
    split_url = urlsplit('http://127.0.0.1/asdf/login.php?q=abc#stackoverflow')
    
    # You now have:
    # split_url.scheme   "http"
    # split_url.netloc   "127.0.0.1" 
    # split_url.path     "/asdf/login.php"
    # split_url.query    "q=abc"
    # split_url.fragment "stackoverflow"
    
    # Use all the path except everything after the last '/' 
    clean_path = "".join(split_url.path.rpartition("/")[:-1])
    
    # "/asdf/"
    
    # urlunsplit joins a urlsplit tuple
    clean_url = urlunsplit(split_url)
    
    # "http://127.0.0.1/asdf/login.php?q=abc#stackoverflow"
    
    
    # A more advanced example 
    advanced_split_url = urlsplit('http://foo:bar@127.0.0.1:5000/asdf/login.php?q=abc#stackoverflow')
    
    # You now have *in addition* to the above:
    # advanced_split_url.username   "foo"
    # advanced_split_url.password   "bar"
    # advanced_split_url.hostname   "127.0.0.1"
    # advanced_split_url.port       "5000"
    

提交回复
热议问题