I notice that netloc
is empty if the URL doesn\'t have //
.
Without //
, netloc
is empty
Would it be possible to identify netloc correctly even if // not provided in the URL?
Not by using urlparse
. This is explicitly explained in the documentation:
Following the syntax specifications in RFC 1808, urlparse recognizes a
netloc
only if it is properly introduced by//
. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
If you don't want to rewrite urlparse
's logic (which I would not suggest), make sure url
starts with //
:
if not url.startswith('//'):
url = '//' + url
EDIT
The above is actually a bad solution as @alexis noted. Perhaps
if not (url.startswith('//') or url.startswith('http://') or url.startswith('https://')):
url = '//' + url
But your mileage may very with that solution as well. If you have to support a wide variety of inconsistent formats you may have to resort to regex.