python urllib2 and unicode

六月ゝ 毕业季﹏ 提交于 2020-01-02 10:15:13

问题


I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.

import urllib2
a = "바둑"
a = a.decode("utf-8")
type(a)
#Out[35]: unicode

url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)
url2 = urllib2.urlopen(url)

give this error

#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)

回答1:


Encode the Unicode data to UTF-8, then URL-encode:

from urllib import urlencode
import urllib2

params = {'where': 'nexearch', 'query': a.encode('utf8')}
params = urlencode(params)

url = "http://search.naver.com/search.naver?" + params
response = urllib2.urlopen(url)

Demo:

>>> from urllib import urlencode
>>> a = u"바둑"
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')}
>>> params = urlencode(params)
>>> params
'query=%EB%B0%94%EB%91%91&where=nexearch'
>>> url = "http://search.naver.com/search.naver?" + params
>>> url
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'

Using urllib.urlencode() to build the parameters is easier, but you can also just escape the query value with urllib.quote_plus():

from urllib import quote_plus
encoded_a = quote_plus(a.encode('utf8'))
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a


来源:https://stackoverflow.com/questions/26762740/python-urllib2-and-unicode

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!