问题
I've been trying to write a script that would potentially scrape the list of usernames off the comments section on a defined YouTube video and paste those usernames onto a .csv file.
Here's the script :
from selenium import webdriver
import time
import csv
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as soup
driver=webdriver.Chrome()
driver.get('https://www.youtube.com/watch?v=VIDEOURL')
time.sleep(5)
driver.execute_script("window.scrollTo(0, 500)")
time.sleep(3)
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
time.sleep(5)
scroll_time = 40
for num in range(0, scroll_time):
html.send_keys(Keys.PAGE_DOWN)
for elem in driver.find_elements_by_xpath('//span[@class="style-scope ytd-comment-renderer"]'):
print(elem.text)
with open('usernames.csv', 'w') as f:
p = csv.writer(f)
p.writerows(str(elem.text));
It keeps throwing out the error for line 19 :
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u30b9' in position 0: character maps to <undefined>
I'd read on here that this may have something to do with how windows console deals with unicodes and saw a potential solution about downloading and installing a unicode library package, but that didn't help either.
Could anyone help me figure out what I'm doing wrong?
PS. I'm using the latest version of python (3.7).
Much appreciated, Sergej.
回答1:
Python 3 str
values need to be encoded as bytes when written to disk. If no encoding is specified for the file, Python will use the platform default. In this case, the default encoding is unable to encode '\u0389', and so raises a UnicodeEncodeError
.
The solution is to specify the encoding as UTF-8 when opening the file:
with open('usernames.csv', 'w', encoding='utf-8') as f:
p = csv.writer(f)
...
Since UTF-8 isn't your platform's default encoding, you'll need to specify the encoding when opening the file as well, in Python code or in applications like Excel.
Windows supports a modified version of UTF-8, named "utf-8-sig" in Python. This encoding inserts three characters at the start of a file to identify the file's encoding to Windows applications which might otherwise attempt to decode using an 8-bit encoding. If the file will be used exclusively on Windows machines then it may be worth using this encoding instead.
with open('usernames.csv', 'w', encoding='utf-8-sig') as f:
p = csv.writer(f)
...
来源:https://stackoverflow.com/questions/52658773/writing-out-results-from-python-to-csv-file-unicodeencodeerror-charmap-codec