问题
I need to perform google search to retrieve the number of results for a query. I found the answer here - Google Search from a Python App
However, for few queries I am getting the below error. I think the query has unicode characters.
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128)
I searched google and found I need to convert unicode to ascii, and found below code.
def convertToAscii(text, action):
temp = unicode(text, "utf-8")
fixed = unicodedata.normalize('NFKD', temp).encode('ASCII', action)
return fixed
except Exception, errorInfo:
print errorInfo
print "Unable to convert the Unicode characters to xml character entities"
raise errorInfo
If I use the action ignore, it removes those characters, but if I use other actions, I am getting exceptions.
Any idea, how to handle this?
Thanks
== Edit == I am using below code to encode and then perform the search and this is throwing the error.
query = urllib.urlencode({'q': searchfor})
回答1:
You cannot urlencode
raw Unicode strings. You need to first encode them to UTF-8 and then feed to it:
query = urllib.urlencode({'q': u"München".encode('UTF-8')})
This returns q=M%C3%BCnchen
which Google happily accepts.
回答2:
You can't safely convert Unicode to ASCII. Doing so involves throwing away information (specifically, it throws away non-English letters).
You should be doing the entire process in Unicode, so as not to lose any information.
来源:https://stackoverflow.com/questions/4777764/unicode-error-trying-to-call-google-search-api