问题
I'm not exactly sure what these modules are used for. I get that they split the respective url into its components, but why would that be useful, or what is an example of when to use urlparse?
回答1:
Use urlparse
only if you need parameter. I have explained below why do you need parameter for.
Reference
urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)
This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.
Hostname is always useful to store in variable to use it later or adding parameter, query to hostname to get the web page you want while scraping.
Regarding Parameter:
FYI: According to RFC2396, parameter in url
Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference.
Parameter are useful in scraping,
e.g. if the url is http://www.example.com/products/women?color=green
When you use urlparse
, you will get parameter. Now You have to change it to men
so it will be http://www.example.com/products/men?color=green
and kids
, girl
, boy
so on.
来源:https://stackoverflow.com/questions/30091297/python-3-why-would-you-use-urlparse-urlsplit