How can I check whether a byte array contains a Unicode string in Java?

前端 未结 7 1196
再見小時候
再見小時候 2021-02-19 04:14

Given a byte array that is either a UTF-8 encoded string or arbitrary binary data, what approaches can be used in Java to determine which it is?

The arr

7条回答
  •  遥遥无期
    2021-02-19 04:39

    If the byte array begins with a Byte Order Mark (BOM) then it will be easy to distinguish what encoding has been used. The standard Java classes for processing text streams will probably deal with this for you automatically.

    If you do not have a BOM in your byte data this will be substantially more difficult — .NET classes can perform statistical analysis to try and work out the encoding, but I think this is on the assumption that you know that you are dealing with text data (just don't know which encoding was used).

    If you have any control over the format for your input data your best choice would be to ensure that it contains a Byte Order Mark.

提交回复
热议问题