I need a programmatic way to get the decimal value of each character in a String, so that I can encode them as HTML entities, for example:
UTF-8:
Ok after reading Jon's post and still musing about surrogates in Java, I decided to be a bit less lazy and google it up. There's actually support for surrogates in the Character class it's just a bit.. roundabout
So here's the code that'll work correctly, assuming valid input:
for (int i = 0; i < str.length(); i++) {
char ch = str.charAt(i);
if (Character.isHighSurrogate(ch)) {
System.out.println("Codepoint: " +
Character.toCodePoint(ch, str.charAt(i + 1)));
i++;
}
System.out.println("Codepoint: " + (int)ch);
}