UnicodeDecodeError when reading CSV file in Pandas with Python

后端 未结 21 2212
野趣味
野趣味 2020-11-22 04:27

I\'m running a program which is processing 30,000 similar files. A random number of them are stopping and producing this error...

File "C:\\Importer\\src         


        
相关标签:
21条回答
  • 2020-11-22 04:45
    with open('filename.csv') as f:
       print(f)
    

    after executing this code you will find encoding of 'filename.csv' then execute code as following

    data=pd.read_csv('filename.csv', encoding="encoding as you found earlier"
    

    there you go

    0 讨论(0)
  • 2020-11-22 04:45

    Sometimes the problem is with the .csv file only. The file may be corrupted. When faced with this issue. 'Save As' the file as csv again.

    0. Open the xls/csv file
    1. Go to -> files 
    2. Click -> Save As 
    3. Write the file name 
    4. Choose 'file type' as -> CSV [very important]
    5. Click -> Ok 
    
    0 讨论(0)
  • 2020-11-22 04:46

    I have trouble opening a CSV file in simplified Chinese downloaded from an online bank, I have tried latin1, I have tried iso-8859-1, I have tried cp1252, all to no avail.

    But pd.read_csv("",encoding ='gbk') simply does the work.

    0 讨论(0)
  • 2020-11-22 04:48

    I am posting an update to this old thread. I found one solution that worked, but requires opening each file. I opened my csv file in LibreOffice, chose Save As > edit filter settings. In the drop-down menu I chose UTF8 encoding. Then I added encoding="utf-8-sig" to the data = pd.read_csv(r'C:\fullpathtofile\filename.csv', sep = ',', encoding="utf-8-sig").

    Hope this helps someone.

    0 讨论(0)
  • 2020-11-22 04:52

    Simplest of all Solutions:

    import pandas as pd
    df = pd.read_csv('file_name.csv', engine='python')
    

    Alternate Solution:

    • Open the csv file in Sublime text editor or VS Code.
    • Save the file in utf-8 format.

    In sublime, Click File -> Save with encoding -> UTF-8

    Then, you can read your file as usual:

    import pandas as pd
    data = pd.read_csv('file_name.csv', encoding='utf-8')
    

    and the other different encoding types are:

    encoding = "cp1252"
    encoding = "ISO-8859-1"
    
    0 讨论(0)
  • 2020-11-22 04:52

    Try changing the encoding. In my case, encoding = "utf-16" worked.

    df = pd.read_csv("file.csv",encoding='utf-16')

    0 讨论(0)
提交回复
热议问题