Parsing hostname and port from string or url

前端未结

关注

 5  2087

迷失自我

I can be given a string in any of these formats:

url: e.g http://www.acme.com:456
string: e.g www.acme.com:456, www.acme.com 456, or www.acme.co

相关标签:

5条回答

时光取名叫无心

2021-02-07 01:25
Method using urllib -
```
    from urllib.parse import urlparse
    url = 'https://stackoverflow.com/questions'
    print(urlparse(url))
```
Output -

ParseResult(scheme='https', netloc='stackoverflow.com', path='/questions', params='', query='', fragment='')

Reference - https://www.tutorialspoint.com/urllib-parse-parse-urls-into-components-in-python
0 讨论(0)
发布评论:

提交评论
- 加载中...
故里飘歌

2021-02-07 01:29
You can use urlparse to get hostname from URL string:
```
from urlparse import urlparse
print urlparse("http://www.website.com/abc/xyz.html").hostname # prints www.website.com
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

青春惊慌失措

2021-02-07 01:29

>>> from urlparse import urlparse   
>>> aaa = urlparse('http://www.acme.com:456')

>>> aaa.hostname  
'www.acme.com'

>>> aaa.port   
456
>>>

0 讨论(0)

一向

2021-02-07 01:33
The reason it fails for:
```
www.acme.com 456
```
is because it is not a valid URI. Why don't you just:
1. Replace the space with a :
2. Parse the resulting string by using the standard urlparse method
Try and make use of default functionality as much as possible, especially when it comes to things like parsing well know formats like URI's.
0 讨论(0)
发布评论:

提交评论
- 加载中...

广开言路

2021-02-07 01:43

I'm not that familiar with urlparse, but using regex you'd do something like:

p = '(?:http.*://)?(?P<host>[^:/ ]+).?(?P<port>[0-9]*).*'

m = re.search(p,'http://www.abc.com:123/test')
m.group('host') # 'www.abc.com'
m.group('port') # '123'

Or, without port:

m = re.search(p,'http://www.abc.com/test')
m.group('host') # 'www.abc.com'
m.group('port') # '' i.e. you'll have to treat this as '80'

EDIT: fixed regex to also match 'www.abc.com 123'

0 讨论(0)