I have a problem when trying to convert bytes to String in Java, with code like:
byte[] bytes = {1, 2, -3};
byte[] transferred = new String(bytes, Charsets.
Not all sequences of bytes are valid in UTF-8.
UTF-8 is a smart scheme with a variable number of bytes per code point, the form of every byte indicating how many other bytes follow for the same code point.
Refer to this table:
Now let's see how it applies to your {1, 2, -3}
:
Bytes 1
(hex 0x01
, binary 00000001
) and 2
(hex 0x02
, binary 00000010
) stand alone, no problem.
Byte -3
(hex 0xFD
, binary 11111101
) is the start byte of a 6-byte sequence (which is actually illegal in the current UTF-8 standard), but your byte array does not have such a sequence.
Your UTF-8 is invalid. The Java UTF-8 decoder replaces this invalid byte -3
with Unicode codepoint U+FFFD REPLACEMENT CHARACTER (also see this). in UTF-8, codepoint U+FFFD is hex 0xEF 0xBF 0xBD
(binary 11101111 10111111 10111101
), represented in Java as -17, -65, -67
.