How can I check whether a byte array contains a Unicode string in Java?

前端 未结 7 1190
再見小時候
再見小時候 2021-02-19 04:14

Given a byte array that is either a UTF-8 encoded string or arbitrary binary data, what approaches can be used in Java to determine which it is?

The arr

相关标签:
7条回答
  • 2021-02-19 05:00

    In the original question: How can I check whether a byte array contains a Unicode string in Java?; I found that the term Java Unicode is essentially referring to Utf16 Code Units. I went through this problem myself and created some code that could help anyone with this type of question on their mind find some answers.

    I have created 2 main methods, one will display Utf-8 Code Units and the other will create Utf-16 Code Units. Utf-16 Code Units is what you will encounter with Java and JavaScript...commonly seen in the form "\ud83d"

    For more help with Code Units and conversion try the website;

    https://r12a.github.io/apps/conversion/

    Here is code...

        byte[] array_bytes = text.toString().getBytes();
        char[] array_chars = text.toString().toCharArray();
        System.out.println();
        byteArrayToUtf8CodeUnits(array_bytes);
        System.out.println();
        charArrayToUtf16CodeUnits(array_chars);
    
    
    public static void byteArrayToUtf8CodeUnits(byte[] byte_array)
    {
        /*for (int k = 0; k < array.length; k++)
        {
            System.out.println(name + "[" + k + "] = " + "0x" + byteToHex(array[k]));
        }*/
        System.out.println("array.length: = " + byte_array.length);
        //------------------------------------------------------------------------------------------
        for (int k = 0; k < byte_array.length; k++)
        {
            System.out.println("array byte: " + "[" + k + "]" + " converted to hex" + " = " + byteToHex(byte_array[k]));
        }
        //------------------------------------------------------------------------------------------
    }
    public static void charArrayToUtf16CodeUnits(char[] char_array)
    {
        /*Utf16 code units are also known as Java Unicode*/
        System.out.println("array.length: = " + char_array.length);
        //------------------------------------------------------------------------------------------
        for (int i = 0; i < char_array.length; i++)
        {
            System.out.println("array char: " + "[" + i + "]" + " converted to hex" + " = " + charToHex(char_array[i]));
        }
        //------------------------------------------------------------------------------------------
    }
    static public String byteToHex(byte b)
    {
        //Returns hex String representation of byte b
        char hexDigit[] =
                {
                        '0', '1', '2', '3', '4', '5', '6', '7',
                        '8', '9', 'a', 'b', 'c', 'd', 'e', 'f'
                };
        char[] array = { hexDigit[(b >> 4) & 0x0f], hexDigit[b & 0x0f] };
        return new String(array);
    }
    static public String charToHex(char c)
    {
        //Returns hex String representation of char c
        byte hi = (byte) (c >>> 8);
        byte lo = (byte) (c & 0xff);
    
        return byteToHex(hi) + byteToHex(lo);
    }
    
    0 讨论(0)
提交回复
热议问题