Parsing hostname and port from string or url

前端 未结 5 2087
迷失自我
迷失自我 2021-02-07 00:54

I can be given a string in any of these formats:

  • url: e.g http://www.acme.com:456

  • string: e.g www.acme.com:456, www.acme.com 456, or www.acme.co

相关标签:
5条回答
  • 2021-02-07 01:25

    Method using urllib -

        from urllib.parse import urlparse
        url = 'https://stackoverflow.com/questions'
        print(urlparse(url))
    

    Output -

    ParseResult(scheme='https', netloc='stackoverflow.com', path='/questions', params='', query='', fragment='')

    Reference - https://www.tutorialspoint.com/urllib-parse-parse-urls-into-components-in-python

    0 讨论(0)
  • 2021-02-07 01:29

    You can use urlparse to get hostname from URL string:

    from urlparse import urlparse
    print urlparse("http://www.website.com/abc/xyz.html").hostname # prints www.website.com
    
    0 讨论(0)
  • 2021-02-07 01:29
    >>> from urlparse import urlparse   
    >>> aaa = urlparse('http://www.acme.com:456')
    
    >>> aaa.hostname  
    'www.acme.com'
    
    >>> aaa.port   
    456
    >>> 
    
    0 讨论(0)
  • 2021-02-07 01:33

    The reason it fails for:

    www.acme.com 456
    

    is because it is not a valid URI. Why don't you just:

    1. Replace the space with a :
    2. Parse the resulting string by using the standard urlparse method

    Try and make use of default functionality as much as possible, especially when it comes to things like parsing well know formats like URI's.

    0 讨论(0)
  • 2021-02-07 01:43

    I'm not that familiar with urlparse, but using regex you'd do something like:

    p = '(?:http.*://)?(?P<host>[^:/ ]+).?(?P<port>[0-9]*).*'
    
    m = re.search(p,'http://www.abc.com:123/test')
    m.group('host') # 'www.abc.com'
    m.group('port') # '123'
    

    Or, without port:

    m = re.search(p,'http://www.abc.com/test')
    m.group('host') # 'www.abc.com'
    m.group('port') # '' i.e. you'll have to treat this as '80'
    

    EDIT: fixed regex to also match 'www.abc.com 123'

    0 讨论(0)
提交回复
热议问题