I have urls formatted as:
google.com
www.google.com
http://google.com
http://www.google.com
I would like to convert all type of links to a
Python do have builtin functions to treat that correctly, like
p = urlparse.urlparse(my_url, 'http')
netloc = p.netloc or p.path
path = p.path if p.netloc else ''
if not netloc.startswith('www.'):
netloc = 'www.' + netloc
p = urlparse.ParseResult('http', netloc, path, *p[3:])
print(p.geturl())
If you want to remove (or add) the www
part, you have to edit the .netloc
field of the resulting object before calling .geturl()
.
Because ParseResult
is a namedtuple, you cannot edit it in-place, but have to create a new object.
PS:
For Python3, it should be urllib.parse.urlparse