UnicodeEncodeError: 'charmap' codec can't encode characters

后端 未结 8 597
感情败类
感情败类 2020-11-22 11:55

I\'m trying to scrape a website, but it gives me an error.

I\'m using the following code:

import urllib.request
from bs4 import BeautifulSoup

get =          


        
相关标签:
8条回答
  • 2020-11-22 12:23

    Even I faced the same issue with the encoding that occurs when you try to print it, read/write it or open it. As others mentioned above adding .encoding="utf-8" will help if you are trying to print it.

    soup.encode("utf-8")

    If you are trying to open scraped data and maybe write it into a file, then open the file with (......,encoding="utf-8")

    with open(filename_csv , 'w', newline='',encoding="utf-8") as csv_file:

    0 讨论(0)
  • 2020-11-22 12:27
    set PYTHONIOENCODING=utf-8
    set PYTHONLEGACYWINDOWSSTDIO=utf-8
    

    You may or may not need to set that second environment variable PYTHONLEGACYWINDOWSSTDIO.

    Alternatively, this can be done in code (although it seems that doing it through env vars is recommended):

    sys.stdin.reconfigure(encoding='utf-8')
    sys.stdout.reconfigure(encoding='utf-8')
    

    Additionally: Reproducing this error was a bit of a pain, so leaving this here too in case you need to reproduce it on your machine:

    set PYTHONIOENCODING=windows-1252
    set PYTHONLEGACYWINDOWSSTDIO=windows-1252
    
    0 讨论(0)
  • 2020-11-22 12:43

    I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

    with open(fname, "w") as f:
        f.write(html)
    

    with this:

    import io
    with io.open(fname, "w", encoding="utf-8") as f:
        f.write(html)
    

    Using io gives you backward compatibility with Python 2.

    If you only need to support Python 3 you can use the builtin open function instead:

    with open(fname, "w", encoding="utf-8") as f:
        f.write(html)
    
    0 讨论(0)
  • 2020-11-22 12:43

    While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production

    import requests
    resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
    print(resp.encoding)
    with open ('NiftyList.txt', 'w') as f:
        f.write(resp.text)
    

    When I added encoding="utf-8" with the open command it saved the file with the correct response

    with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
        f.write(resp.text)
    
    0 讨论(0)
  • 2020-11-22 12:47

    I fixed it by adding .encode("utf-8") to soup.

    That means that print(soup) becomes print(soup.encode("utf-8")).

    0 讨论(0)
  • 2020-11-22 12:48

    For those still getting this error, adding encode("utf-8") to soup will also fix this.

    soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
    print(soup)
    
    0 讨论(0)
提交回复
热议问题