Regular Expression for Japanese characters

问题

I am doing internationalization in Struts. I want to write Javascript validation for Japanese and English users. I know regular expression for English but not for Japanese users. Is it possible to write one regular expression for both the users which validate on the basis of Unicode?

Please help me.

回答1:

This thread may be old but just thought that I add my 2 cents. Here is a regular expression that can be used to match all English alphanumerics, Japanese katakana,hiragana,multibytes of alphanumerics [hankaku and zenkaku],dashes

/[一-龠]+|[ぁ-ゔ]+|[ァ-ヴー]+|[a-zA-Z0-9]+|[ａ-ｚＡ-Ｚ０-９]+[々〆〤]+/u

You can edit it to fit your needs but notice the "u" flag at the end.

I hope this helps!

回答2:

Provided your text editor and programming language support Unicode, you should be able to enter Japanese characters as literal strings. Things like [A-X] ranges will probably not translate very well in general.

What kind of text are you trying to validate?

What language are the regular experssions in? Perl-compatible, POSIX, or something else?

回答3:

As long as you save your scripts in the same character set as your page (e.g. both HTML and JavaScript are UTF-8 or both HTML and JavaScript are Shift_JIS), you should be able to treat your regular expressions exactly the same as you would with English.

function isKansai(city) {
    var rxKansai = /(大阪|兵庫|京都|滋賀|奈良|和歌山|osaka|hyo{1,2}go|kyoto|shiga|nara|wakayama)/i;
    return rxKansai.test(city);
}
isKansai('東京'); // false
isKansai('大阪'); // true
isKansai('Tokyo'); // false
isKansai('Osaka') // true

来源：https://stackoverflow.com/questions/6787716/regular-expression-for-japanese-characters

标签

javascript

regex

unicode

internationalization

cjk