I am learning about Regular expressions (regex) for English and although some of the concepts seem like they would apply to other languages such as Japanese, I feel as if ma
In Unicode there are two ways to classify characters from different writing systems. They are
The differences between these are explained rather more clearly on this web page from the official Unicode website.
In terms of matching characters in regular expressions in Java, you can use either classification mechanism since Java 7.
This is the syntax, as indicated in this tutorial from the Oracle website:
Script:
either \p{IsHiragana}
or \p{script=Hiragana}
Block:
either \p{InHiragana}
or \p{block=Hiragana}
Note that in one case it's "Is", in the other it's "In".
The syntax \p{Hiragana}
indicated in the accepted answer does not seem to be a valid option. I tried it just in case but can confirm that it did not work for me.