'ascii' codec can't encode character : ordinal not in range (128)

后端 未结 2 1943
你的背包
你的背包 2021-01-14 11:10

I\'m scraping some webpages using selenium and beautifulsoup. I\'m iterating through a bunch of links, grabbing info, and then dumping it into a JSON:

for e         


        
相关标签:
2条回答
  • 2021-01-14 11:28

    You might need to set PYTHONIOENCODING before running your python script in the shell. For example, I got the same error while redirecting the python script output into a log file:

    $ your_python_script > output.log
    'ascii' codec can't encode characters in position xxxxx-xxxxx: ordinal not in range(128)
    

    After changing PYTHONIOENCODING to UTF8 in the shell, script executed with no ASCII codec error:

    $ export PYTHONIOENCODING=utf8
    
    $ your_python_script > output.log
    
    0 讨论(0)
  • 2021-01-14 11:36

    Your problem is that, in Python 2, a file object (as returned by open()) can only write str objects, not unicode objects. Passing ensure_ascii=False to json.dump() makes it attempt to write Unicode strings to the file directly as unicode objects, which will fail.

    json.dump(item, writeJSON, ensure_ascii=False).encode('utf-8')
    

    This attempted fix doesn't work because json.dump() doesn't return anything; instead, it writes content directly to the file. (If there weren't any Unicode text in item, this would crash after json.dump() completed -- json.dump() returns None, which can't have .encode() called on it.)

    There's three ways to go about fixing this:

    1. Use Python 3. The unification of str and unicode in Python 3 makes your existing code work as-is; no code changes are necessary.

    2. Remove ensure_ascii=False from your call to json.dump. Non-ASCII characters will be written to the file in escaped form -- for instance, ï will be written as \u00ef. This is a perfectly valid way of representing Unicode characters, and most JSON libraries will handle it just fine.

    3. Wrap the file object in a UTF-8 StreamWriter:

      import codecs
      with codecs.getwriter("utf8")(open("testScrape.json", "w")) as writeJSON:
          json.dump(item, writeJSON, ensure_ascii=False)
      
    0 讨论(0)
提交回复
热议问题