How do I get the decimal value of a unicode character in Java?

后端 未结 2 425
耶瑟儿~
耶瑟儿~ 2021-01-18 04:28

I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:

UTF-8:

相关标签:
2条回答
  • 2021-01-18 04:49

    I suspect you're just interested in a conversion from char to int, which is implicit:

    for (int i = 0; i < text.length(); i++)
    {
        char c = text.charAt(i);
        int value = c;
        System.out.println(value);
    }
    

    EDIT: If you want to handle surrogate pairs, you can use something like:

    for (int i = 0; i < text.length(); i++)
    {
        int codePoint = text.codePointAt(i);
        // Skip over the second char in a surrogate pair
        if (codePoint > 0xffff)
        {
            i++;
        }
        System.out.println(codePoint);
    }
    
    0 讨论(0)
  • 2021-01-18 05:00

    Ok after reading Jon's post and still musing about surrogates in Java, I decided to be a bit less lazy and google it up. There's actually support for surrogates in the Character class it's just a bit.. roundabout

    So here's the code that'll work correctly, assuming valid input:

        for (int i = 0; i < str.length(); i++) {
            char ch = str.charAt(i);
            if (Character.isHighSurrogate(ch)) {
                System.out.println("Codepoint: " + 
                       Character.toCodePoint(ch, str.charAt(i + 1)));
                i++;
            }
            System.out.println("Codepoint: " + (int)ch);
        }
    
    0 讨论(0)
提交回复
热议问题