Python 3 : Why would you use urlparse/urlsplit [closed]

北慕城南 提交于 2019-12-11 10:17:05

问题


I'm not exactly sure what these modules are used for. I get that they split the respective url into its components, but why would that be useful, or what is an example of when to use urlparse?


回答1:


Use urlparse only if you need parameter. I have explained below why do you need parameter for.

Reference

urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True)

This is similar to urlparse(), but does not split the params from the URL. This should generally be used instead of urlparse() if the more recent URL syntax allowing parameters to be applied to each segment of the path portion of the URL (see RFC 2396) is wanted.

Hostname is always useful to store in variable to use it later or adding parameter, query to hostname to get the web page you want while scraping.

Regarding Parameter:

FYI: According to RFC2396, parameter in url

Extensive testing of current client applications demonstrated that the majority of deployed systems do not use the ";" character to indicate trailing parameter information, and that the presence of a semicolon in a path segment does not affect the relative parsing of that segment. Therefore, parameters have been removed as a separate component and may now appear in any path segment. Their influence has been removed from the algorithm for resolving a relative URI reference.

Parameter are useful in scraping, e.g. if the url is http://www.example.com/products/women?color=green

When you use urlparse, you will get parameter. Now You have to change it to men so it will be http://www.example.com/products/men?color=green and kids, girl, boy so on.



来源:https://stackoverflow.com/questions/30091297/python-3-why-would-you-use-urlparse-urlsplit

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!