问题
I would like to collect information from the results given by a search engine. But I can only write text instead of unicode in the query part.
import urllib2
a = "바둑"
a = a.decode("utf-8")
type(a)
#Out[35]: unicode
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" %(a)
url2 = urllib2.urlopen(url)
give this error
#UnicodeEncodeError: 'ascii' codec can't encode characters in position 39-40: ordinal not in range(128)
回答1:
Encode the Unicode data to UTF-8, then URL-encode:
from urllib import urlencode
import urllib2
params = {'where': 'nexearch', 'query': a.encode('utf8')}
params = urlencode(params)
url = "http://search.naver.com/search.naver?" + params
response = urllib2.urlopen(url)
Demo:
>>> from urllib import urlencode
>>> a = u"바둑"
>>> params = {'where': 'nexearch', 'query': a.encode('utf8')}
>>> params = urlencode(params)
>>> params
'query=%EB%B0%94%EB%91%91&where=nexearch'
>>> url = "http://search.naver.com/search.naver?" + params
>>> url
'http://search.naver.com/search.naver?query=%EB%B0%94%EB%91%91&where=nexearch'
Using urllib.urlencode() to build the parameters is easier, but you can also just escape the query
value with urllib.quote_plus():
from urllib import quote_plus
encoded_a = quote_plus(a.encode('utf8'))
url = "http://search.naver.com/search.naver?where=nexearch&query=%s" % encoded_a
来源:https://stackoverflow.com/questions/26762740/python-urllib2-and-unicode