I am fetching data from a catalog and it\'s giving data in bytes format.
Bytes data:
b\'\\x80\\x00\\x00\\x00\\n\\x00\\x00%\\x83\\xa0\\x08\\x01\\x00\\
The UTF-8 encoding has some built-in redundancy that serves at least two purposes:
Start bytes (in binary dots carrying actual data) match one of these 4 patterns
0.......
110.....
1110....
11110...
whereas continuation bytes (0 to 3) have always this form
10......
If this encoding is not respected, it is safe to say that it is not UTF-8 data, e.g. because corruptions occurred during a transfer.
Why is it possible to say that b'\x80\'
cannot be UTF-8?
Already at the first two bytes the encoding is violated: because 80 must be a continuation byte. This is exactly what your error message says:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
And even if you skip this one, you get another problem some bytes later at b'%\x83'
, so it's most likely that either you are trying to decode the wrong data or assume the wrong encoding.