Identify if a Unicode code point represents a character from a certain script such as the Latin script?

问题

Unicode categorizes characters as belonging to a script, such as the Latin script.

How do I test whether a particular character (code point) is in a particular script?

回答1:

Java represents the various Unicode scripts in the Character.UnicodeScript enum, including for example Character.UnicodeScript.LATIN. These match the Unicode Script Properties.

You can test a character by submitting its code point integer number to the of method on that enum.

int codePoint = "a".codePointAt( 0 ) ; 
Character.UnicodeScript script = Character.UnicodeScript.of( codePoint ) ;
if( Character.UnicodeScript.LATIN.equals( script ) ) { … }

Alternatively:

boolean isLatinScript = 
        Character.UnicodeScript.LATIN
        .equals( 
            Character.UnicodeScript.of( codePoint ) 
        )
;

Example usage.

System.out.println(
        Character.UnicodeScript.LATIN      // Constant defined on the enum.
        .equals(                           // `java.lang.Enum.equals()` comparing two constants defined on the enum.
            Character.UnicodeScript.of(    // Determine which Unicode script for this character.
                "😷".codePointAt( 0 )      // Get the code point integer number of the first (and only) character in this string.
            )                              // Returns a `Character.UnicodeScript` enum object. 
        )                                  // Returns `boolean`. 
);

See this code run at IdeOne.com.

false

FYI, the Character class lets you ask if a code point represents a character that isDigit, isLetter, isLetterOrDigit, isLowerCase, and more.

来源：https://stackoverflow.com/questions/62109781/identify-if-a-unicode-code-point-represents-a-character-from-a-certain-script-su

标签

java

unicode

character

codepoint

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!