Processing: How to convert a char datatype into its utf-8 int representation?

爱⌒轻易说出口 提交于 2019-12-11 13:44:06

问题


How can I convert a char datatype into its utf-8 int representation in Processing?

So if I had an array ['a', 'b', 'c'] I'd like to obtain another array [61, 62, 63].


回答1:


After my answer I figured out a much easier and more direct way of converting to the types of numbers you wanted. What you want for 'a' is 61 instead of 97 and so forth. That is not very hard seeing that 61 is the hexadecimal representation of the decimal 97. So all you need to do is feed your char into a specific method like so:

Integer.toHexString((int)'a');

If you have an array of chars like so:

char[] c = {'a', 'b', 'c', 'd'};

Then you can use the above thusly:

Integer.toHexString((int)c[0]);

and so on and so forth.

EDIT

As per v.k.'s example in the comments below, you can do the following in Processing:

char c = 'a';

The above will give you a hex representation of the character as a String.

// to save the hex representation as an int you need to parse it since hex() returns a String
int hexNum = PApplet.parseInt(hex(c));

// OR

int hexNum = int(c);

For the benefit of the OP and the commenter below. You will get 97 for 'a' even if you used my previous suggestion in the answer because 97 is the decimal representation of hexadecimal 61. Seeing that UTF-8 matches with the first 127 ASCII entries value for value, I don't see why one would expect anything different anyway. As for the UnsupportedEncodingException, a simple fix would be to wrap the statements in a try/catch block. However that is not necessary seeing that the above directly answers the question in a much simpler way.




回答2:


what do you mean "utf-8 int"? UTF8 is a multi-byte encoding scheme for letters (technically, glyphs) represented as Unicode numbers. In your example you use trivial letters from the ASCII set, but that set has very little to do with a real unicode/utf8 question.

For simple letters, you can literally just int cast:

print((int)'a') -> 97
print((int)'A') -> 65

But you can't do that with characters outside the 16 bit char range. print((int)'二') works, (giving 20108, or 4E8C in hex) but print((int)'𠄢') will give a compile error because the character code for 𠄢 does not fit in 16 bits (it's supposed to be 131362, or 20122 in hex, which gets encoded as a three byte UTF-8 sequence 239+191+189)

So for Unicode characters with a code higher than 0xFFFF you can't use int casting, and you'll actually have to think hard about what you're decoding. If you want true Unicode point values, you'll have to literally decode the byte print, but the Processing IDE doesn't actually let you do that; it will tell you that "𠄢".length() is 1, when in real Java it's really actually 3. There is -in current Processing- no way to actually get the Unicode value for any character with a code higher than 0xFFFF.

update

Someone mentioned you actually wanted hex strings. If so, use the built in hex function.

println(hex((int)'a')) -> 00000061

and if you only want 2, 4, or 6 characters, just use substring:

println(hex((int)'a').substring(4)) -> 0061


来源:https://stackoverflow.com/questions/16682000/processing-how-to-convert-a-char-datatype-into-its-utf-8-int-representation

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!