问题
I have seen Questions and Answers about obtaining the code point number of a Unicode character in Java. For example, the Question How can I get a Unicode character's code?.
But I want the opposite: given an integer number, how do I get text of that character assigned to that code point number?
The char
primitive data type is of no use, being limited to only the Basic Multilingual Plane of the Unicode character set. That plane represents approximately the first 64,000 characters defined in Unicode. But Unicode has grown to nearly double that, over 113,000 characters defined now. The numbers assigned to characters range over a million. Being based on 16-bits, a char
is limited to a range of 64K, not nearly enough.
Both Character and String classes offer the method codePointAt to examine a character and return an int
representing the code point assigned in Unicode. I am looking for the opposite.
➥ Given an int
, how to get an object of Character, String, or some implementation of CharSequence that I can then join to other text?
When writing string literals, we can use a Unicode escape sequence with the backslash-with-u. But I am interested in working with integer variables, soft-coding rather than hardcoding the Unicode characters.
回答1:
tl;dr
String s = Character.toString( 128_567 ) ;
😷
Details
You asked for an object of Character
, String
, or some implementation of CharSequence
.
Character
The Character
class is actually legacy, a mere object wrapper around the primitive char
type. The char
type is legacy too, being defined internally as a 16-bit number limited to the first 64K of Unicode code points. Unicode now has more than twice than number of code points assigned to characters, so char
fails to represent most characters.
So we cannot instantiate a Character
object for a character outside the Basic Multilingual Plane set of characters. So, as a workaround, Character.toString( int )
produces a String
containing a single character. String
can handle any and all Unicode characters, while Character
cannot.
String
🡄 Character.toString( int )
To get a String object containing a single character determined by an int
, pass the int
to Character.toString().
As an example, we use FACE WITH MEDICAL MASK, an emoji character at U+1F637 (decimal: 128,567).
// -----| input |----------------
String input = "😷" ; // FACE WITH MEDICAL MASK at code point U+1F637 (decimal: 128,567).
int codePoint = input.codePointAt( 0 ) ; // Returns 128,567.
System.out.println( "codePoint : " + codePoint ) ;
codePoint : 128567
Convert that int
primitive variable to a String
.
// -----| String |----------------
String output = Character.toString( codePoint ) ; // Pass an `int` primitive integer number.
System.out.println( "output : " + output ) ;
output : 😷
Or use a literal integer number.
String output2 = Character.toString( 128_567 ) ; // Pass an integer literal.
System.out.println( "output2 : " + output2 ) ;
output2 : 😷
See this code run live at IdeOne.com.
CharSequence
The code above works, as String is an implementation of CharSequence.
CharSequence cs = Character.toString( 128_567 ) ; // Returns a `String` which is a `CharSequence`.
I am surprised that I cannot find any way to add a character to an object of either the StringBuilder or StringBuffer classes that implement CharSequence
. Again, perhaps I have failed to notice such a method.
来源:https://stackoverflow.com/questions/60347814/given-the-number-of-a-unicode-code-point-how-can-i-obtain-a-string-or-charseque