Creating Unicode character from its number

后端 未结 13 1714
挽巷
挽巷 2020-11-28 21:38

I want to display a Unicode character in Java. If I do this, it works just fine:

String symbol = \"\\u2202\";

symbol is equal to \"∂\". That\'

相关标签:
13条回答
  • 2020-11-28 22:05

    The code below will write the 4 unicode chars (represented by decimals) for the word "be" in Japanese. Yes, the verb "be" in Japanese has 4 chars! The value of characters is in decimal and it has been read into an array of String[] -- using split for instance. If you have Octal or Hex, parseInt take a radix as well.

    // pseudo code
    // 1. init the String[] containing the 4 unicodes in decima :: intsInStrs 
    // 2. allocate the proper number of character pairs :: c2s
    // 3. Using Integer.parseInt (... with radix or not) get the right int value
    // 4. place it in the correct location of in the array of character pairs
    // 5. convert c2s[] to String
    // 6. print 
    
    String[] intsInStrs = {"12354", "12426", "12414", "12377"}; // 1.
    char [] c2s = new char [intsInStrs.length * 2];  // 2.  two chars per unicode
    
    int ii = 0;
    for (String intString : intsInStrs) {
        // 3. NB ii*2 because the 16 bit value of Unicode is written in 2 chars
        Character.toChars(Integer.parseInt(intsInStrs[ii]), c2s, ii * 2 ); // 3 + 4
        ++ii; // advance to the next char
    }
    
    String symbols = new String(c2s);  // 5.
    System.out.println("\nLooooonger code point: " + symbols); // 6.
    // I tested it in Eclipse and Java 7 and it works.  Enjoy
    
    0 讨论(0)
  • 2020-11-28 22:06

    The other answers here either only support unicode up to U+FFFF (the answers dealing with just one instance of char) or don't tell how to get to the actual symbol (the answers stopping at Character.toChars() or using incorrect method after that), so adding my answer here, too.

    To support supplementary code points also, this is what needs to be done:

    // this character:
    // http://www.isthisthingon.org/unicode/index.php?page=1F&subpage=4&glyph=1F495
    // using code points here, not U+n notation
    // for equivalence with U+n, below would be 0xnnnn
    int codePoint = 128149;
    // converting to char[] pair
    char[] charPair = Character.toChars(codePoint);
    // and to String, containing the character we want
    String symbol = new String(charPair);
    
    // we now have str with the desired character as the first item
    // confirm that we indeed have character with code point 128149
    System.out.println("First code point: " + symbol.codePointAt(0));
    

    I also did a quick test as to which conversion methods work and which don't

    int codePoint = 128149;
    char[] charPair = Character.toChars(codePoint);
    
    String str = new String(charPair, 0, 2);
    System.out.println("First code point: " + str.codePointAt(0));    // 128149, worked
    String str2 = charPair.toString();
    System.out.println("Second code point: " + str2.codePointAt(0));  // 91, didn't work
    String str3 = new String(charPair);
    System.out.println("Third code point: " + str3.codePointAt(0));   // 128149, worked
    String str4 = String.valueOf(codePoint);
    System.out.println("Fourth code point: " + str4.codePointAt(0));  // 49, didn't work
    String str5 = new String(new int[] {codePoint}, 0, 1);
    System.out.println("Fifth code point: " + str5.codePointAt(0));   // 128149, worked
    
    0 讨论(0)
  • 2020-11-28 22:06

    Although this is an old question, there is a very easy way to do this in Java 11 which was released today: you can use a new overload of Character.toString():

    public static String toString​(int codePoint)
    
    Returns a String object representing the specified character (Unicode code point). The result is a string of length 1 or 2, consisting solely of the specified codePoint.
    
    Parameters:
    codePoint - the codePoint to be converted
    
    Returns:
    the string representation of the specified codePoint
    
    Throws:
    IllegalArgumentException - if the specified codePoint is not a valid Unicode code point.
    
    Since:
    11
    

    Since this method supports any Unicode code point, the length of the returned String is not necessarily 1.

    The code needed for the example given in the question is simply:

        int codePoint = '\u2202';
        String s = Character.toString(codePoint); // <<< Requires JDK 11 !!!
        System.out.println(s); // Prints ∂
    

    This approach offers several advantages:

    • It works for any Unicode code point rather than just those that can be handled using a char.
    • It's concise, and it's easy to understand what the code is doing.
    • It returns the value as a string rather than a char[], which is often what you want. The answer posted by McDowell is appropriate if you want the code point returned as char[].
    0 讨论(0)
  • 2020-11-28 22:09

    Just cast your int to a char. You can convert that to a String using Character.toString():

    String s = Character.toString((char)c);
    

    EDIT:

    Just remember that the escape sequences in Java source code (the \u bits) are in HEX, so if you're trying to reproduce an escape sequence, you'll need something like int c = 0x2202.

    0 讨论(0)
  • 2020-11-28 22:11

    Here is a block to print out unicode chars between \u00c0 to \u00ff:

    char[] ca = {'\u00c0'};
    for (int i = 0; i < 4; i++) {
        for (int j = 0; j < 16; j++) {
            String sc = new String(ca);
            System.out.print(sc + " ");
            ca[0]++;
        }
        System.out.println();
    }
    
    0 讨论(0)
  • 2020-11-28 22:16

    If you want to get a UTF-16 encoded code unit as a char, you can parse the integer and cast to it as others have suggested.

    If you want to support all code points, use Character.toChars(int). This will handle cases where code points cannot fit in a single char value.

    Doc says:

    Converts the specified character (Unicode code point) to its UTF-16 representation stored in a char array. If the specified code point is a BMP (Basic Multilingual Plane or Plane 0) value, the resulting char array has the same value as codePoint. If the specified code point is a supplementary code point, the resulting char array has the corresponding surrogate pair.

    0 讨论(0)
提交回复
热议问题