How to classify Japanese characters as either kanji or kana?

后端 未结 5 926
既然无缘
既然无缘 2021-02-01 22:57

Given the text below, how can I classify each character as kana or kanji?

誰か確認上記これらのフ

To get some thing like this

誰 - kanji
か - kana
確 - kanji
認          


        
5条回答
  •  南笙
    南笙 (楼主)
    2021-02-01 23:50

    This functionality is built into the Character.UnicodeBlock class. Some examples of the Unicode blocks related to the Japanese language:

    Character.UnicodeBlock.of('誰') == CJK_UNIFIED_IDEOGRAPHS
    Character.UnicodeBlock.of('か') == HIRAGANA
    Character.UnicodeBlock.of('フ') == KATAKANA
    Character.UnicodeBlock.of('フ') == HALFWIDTH_AND_FULLWIDTH_FORMS
    Character.UnicodeBlock.of('!') == HALFWIDTH_AND_FULLWIDTH_FORMS
    Character.UnicodeBlock.of('。') == CJK_SYMBOLS_AND_PUNCTUATION
    

    But, as always, the devil is in the details:

    Character.UnicodeBlock.of('A') == HALFWIDTH_AND_FULLWIDTH_FORMS
    

    where is the full-width character. So this is in the same category as the halfwidth Katakana above. Note that the full-width is different from the normal (half-width) A:

    Character.UnicodeBlock.of('A') == BASIC_LATIN
    

提交回复
热议问题