UnicodeEncodeError: 'ascii' codec can't encode character '\\xe9' - -when using urlib.request python3

心不动则不痛 提交于 2019-11-26 15:32:51

Use a percent-encoded URL:

link = 'http://finance.yahoo.com/news/caf%C3%A9s-growing-faster-than-fast-food-peers-144512056.html'

I found the above percent-encoded URL by pointing the browser at

http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html

going to the page, then copying-and-pasting the encoded url supplied by the browser back into the text editor. However, you can generate a percent-encoded URL programmatically using:

from urllib import parse

link = 'http://finance.yahoo.com/news/cafés-growing-faster-than-fast-food-peers-144512056.html'

scheme, netloc, path, query, fragment = parse.urlsplit(link)
path = parse.quote(path)
link = parse.urlunsplit((scheme, netloc, path, query, fragment))

which yields

http://finance.yahoo.com/news/caf%C3%A9s-growing-faster-than-fast-food-peers-144512056.html

Your URL contains characters that cannot be represented as ASCII characters.

You'll have to ensure that all characters have been properly URL encoded; use urllib.parse.quote_plus for example; it'll use UTF-8 URL-encoded escaping to represent any non-ASCII characters.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!