UnicodeDecodeError, invalid continuation byte

前端 未结 10 2007
忘掉有多难
忘掉有多难 2020-11-22 08:25

Why is the below item failing? Why does it succeed with "latin-1" codec?

o = "a test of \\xe9 char" #I want this to remain a string as thi         


        
相关标签:
10条回答
  • 2020-11-22 09:03

    I had the same error when I tried to open a CSV file by pandas.read_csv method.

    The solution was change the encoding to latin-1:

    pd.read_csv('ml-100k/u.item', sep='|', names=m_cols , encoding='latin-1')
    
    0 讨论(0)
  • 2020-11-22 09:06

    In binary, 0xE9 looks like 1110 1001. If you read about UTF-8 on Wikipedia, you’ll see that such a byte must be followed by two of the form 10xx xxxx. So, for example:

    >>> b'\xe9\x80\x80'.decode('utf-8')
    u'\u9000'
    

    But that’s just the mechanical cause of the exception. In this case, you have a string that is almost certainly encoded in latin 1. You can see how UTF-8 and latin 1 look different:

    >>> u'\xe9'.encode('utf-8')
    b'\xc3\xa9'
    >>> u'\xe9'.encode('latin-1')
    b'\xe9'
    

    (Note, I'm using a mix of Python 2 and 3 representation here. The input is valid in any version of Python, but your Python interpreter is unlikely to actually show both unicode and byte strings in this way.)

    0 讨论(0)
  • 2020-11-22 09:12

    Well this type of error comes when u are taking input a particular file or data in pandas such as :-

    data=pd.read_csv('/kaggle/input/fertilizers-by-product-fao/FertilizersProduct.csv)
    

    Then the error is displaying like this :- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf4 in position 1: invalid continuation byte

    So to avoid this type of error can be removed by adding an argument

    data=pd.read_csv('/kaggle/input/fertilizers-by-product-fao/FertilizersProduct.csv', encoding='ISO-8859-1')
    
    0 讨论(0)
  • 2020-11-22 09:16

    If this error arises when manipulating a file that was just opened, check to see if you opened it in 'rb' mode

    0 讨论(0)
提交回复
热议问题