I am trying to pull a list of 500 restaurants in Amsterdam from TripAdvisor; however after the 308th restaurant I get the following error:
Traceback (most recent call last):
File "C:/Users/dtrinh/PycharmProjects/TripAdvisorData/LinkPull-HK.py", line 43, in <module>
writer.writerow(rest_array)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 6: ordinal not in range(128)
I tried several things I found on StackOverflow, but nothing is working as of right now. I was wondering if someone could take a look at my code and see any potential solutions that would be great.
for item in soup2.findAll('div', attrs={'class', 'title'}):
if 'Cuisine' in item.text:
item.text.strip()
content = item.findNext('div', attrs=('class', 'content'))
cuisine_type = content.text.encode('utf8', 'ignore').strip().split(r'\xa0')
rest_array = [account_name, rest_address, postcode, phonenumber, cuisine_type]
#print rest_array
with open('ListingsPull-Amsterdam.csv', 'a') as file:
writer = csv.writer(file)
writer.writerow(rest_array)
break
The rest_array
contains unicode strings. When you use csv.writer
to write rows, you need to serialise bytes strings (you are on Python 2.7).
I suggest you to use "utf8" encoding:
with open('ListingsPull-Amsterdam.csv', mode='a') as fd:
writer = csv.writer(fd)
rest_array = [text.encode("utf8") for text in rest_array]
writer.writerow(rest_array)
note: please, don't use file
as variable because you shadow the built-in function file()
(an alias of open()
function).
If you want to open this CSV file with Microsoft Excel, you may consider using another encoding, for instance "cp1252" (it allows u"\u2019" character).
You're writing a non-ascii character(s) to your csv output file. Make sure you open the output file with the appropriate character encoding that allows for the character(s) to be encoded. A safe bet is often UTF-8. Try this:
with open('ListingsPull-Amsterdam.csv', 'a', encoding='utf-8') as file:
writer = csv.writer(file)
writer.writerow(rest_array)
edit this is for Python 3.x, sorry.
来源:https://stackoverflow.com/questions/40619675/unicodeencodeerror-ascii-codec-cant-encode-character-u-u2019-in-position-6