Writing out results from python to csv file [UnicodeEncodeError: 'charmap' codec can't encode character

六眼飞鱼酱① 提交于 2021-02-07 08:12:31

问题


I've been trying to write a script that would potentially scrape the list of usernames off the comments section on a defined YouTube video and paste those usernames onto a .csv file.

Here's the script :

from selenium import webdriver
import time
import csv
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup as soup
driver=webdriver.Chrome()
driver.get('https://www.youtube.com/watch?v=VIDEOURL')
time.sleep(5)
driver.execute_script("window.scrollTo(0, 500)")
time.sleep(3)
html = driver.find_element_by_tag_name('html')
html.send_keys(Keys.END)
time.sleep(5)
scroll_time = 40
for num in range(0, scroll_time):
    html.send_keys(Keys.PAGE_DOWN)
for elem in driver.find_elements_by_xpath('//span[@class="style-scope ytd-comment-renderer"]'):
    print(elem.text)
    with open('usernames.csv', 'w') as f:
        p = csv.writer(f)
        p.writerows(str(elem.text));

It keeps throwing out the error for line 19 :

return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\u30b9' in position 0: character maps to <undefined>

I'd read on here that this may have something to do with how windows console deals with unicodes and saw a potential solution about downloading and installing a unicode library package, but that didn't help either.

Could anyone help me figure out what I'm doing wrong?

PS. I'm using the latest version of python (3.7).

Much appreciated, Sergej.


回答1:


Python 3 str values need to be encoded as bytes when written to disk. If no encoding is specified for the file, Python will use the platform default. In this case, the default encoding is unable to encode '\u0389', and so raises a UnicodeEncodeError.

The solution is to specify the encoding as UTF-8 when opening the file:

with open('usernames.csv', 'w', encoding='utf-8') as f:
    p = csv.writer(f)
    ...

Since UTF-8 isn't your platform's default encoding, you'll need to specify the encoding when opening the file as well, in Python code or in applications like Excel.

Windows supports a modified version of UTF-8, named "utf-8-sig" in Python. This encoding inserts three characters at the start of a file to identify the file's encoding to Windows applications which might otherwise attempt to decode using an 8-bit encoding. If the file will be used exclusively on Windows machines then it may be worth using this encoding instead.

with open('usernames.csv', 'w', encoding='utf-8-sig') as f:
    p = csv.writer(f)
    ...


来源:https://stackoverflow.com/questions/52658773/writing-out-results-from-python-to-csv-file-unicodeencodeerror-charmap-codec

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!