Regular expressions (regex) in Japanese

前端 未结 4 1099
你的背包
你的背包 2020-12-29 08:28

I am learning about Regular expressions (regex) for English and although some of the concepts seem like they would apply to other languages such as Japanese, I feel as if ma

4条回答
  •  一生所求
    2020-12-29 09:01

    In Unicode there are two ways to classify characters from different writing systems. They are

    • Unicode Script (all characters used in a script, regardless of Unicode code points - may come from different blocks)
    • Unicode Block (code point ranges used for a specific purpose/script - may span across scripts and scripts may span across blocks)

    The differences between these are explained rather more clearly on this web page from the official Unicode website.

    In terms of matching characters in regular expressions in Java, you can use either classification mechanism since Java 7.

    This is the syntax, as indicated in this tutorial from the Oracle website:

    Script:

    either \p{IsHiragana} or \p{script=Hiragana}

    Block:

    either \p{InHiragana} or \p{block=Hiragana}

    Note that in one case it's "Is", in the other it's "In".

    The syntax \p{Hiragana} indicated in the accepted answer does not seem to be a valid option. I tried it just in case but can confirm that it did not work for me.

提交回复
热议问题