Python: UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

后端 未结 3 570
梦如初夏
梦如初夏 2021-01-15 04:11

I am fetching data from a catalog and it\'s giving data in bytes format.

Bytes data:

b\'\\x80\\x00\\x00\\x00\\n\\x00\\x00%\\x83\\xa0\\x08\\x01\\x00\\         


        
3条回答
  •  再見小時候
    2021-01-15 04:39

    The UTF-8 encoding has some built-in redundancy that serves at least two purposes:

    1) locating code points reading back and forth

    Start bytes (in binary dots carrying actual data) match one of these 4 patterns

    0.......
    110.....
    1110....
    11110...
    

    whereas continuation bytes (0 to 3) have always this form

    10......
    

    2) checking for validity

    If this encoding is not respected, it is safe to say that it is not UTF-8 data, e.g. because corruptions occurred during a transfer.

    Concludion

    Why is it possible to say that b'\x80\' cannot be UTF-8? Already at the first two bytes the encoding is violated: because 80 must be a continuation byte. This is exactly what your error message says:

    UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

    And even if you skip this one, you get another problem some bytes later at b'%\x83', so it's most likely that either you are trying to decode the wrong data or assume the wrong encoding.

提交回复
热议问题