Validating Kana Input

為{幸葍}努か 提交于 2019-12-05 12:36:11

It sounds like you basically need to just check whether each Unicode character is within a particular range. The Unicode code charts should be a good starting point.

If you're using .NET, my MiscUtil library has some Unicode range support - it's primitive, but it should do the job. I don't have the source to hand right now, but will update this post with an example later if it would be helpful.

Not sure of a perfect answer, but there is a Unicode range for katakana and hiragana listed on Wikipedia. (Which I would expect are also available from unicode.org as well.)

  • Hiragana: Unicode: 3040-309F
  • Katakana: Unicode: 30A0–30FF

Checking those ranges against the input should work as a validation for hiragana or katakana for Unicode in a language-agnostic manner.

For kanji, I would expect it to be a little more complicated, as I expect that the Chinese characters used in Chinese and Japanese are both included in the same range, but then again, I may be wrong here. (I can't expect that Simplified Chinese and Traditional Chinese to be included in the same range...)

oh oh! I had this one once... I had a regex with the hiragana, then katakana and then the kanji. I forget the exact codes, I'll go have a look.

regex is great because you double the problems. And I did it in PHP, my choice for extra strong auto problem generation

--edit--

$pattern = '/[^\wぁ-ゔァ-ヺー\x{4E00}-\x{9FAF}_\-]+/u';

I found this here, but it's not great... I'll keep looking

--edit-- I looked through my portable hard drive.... I thought I had kept that particular snippet from the last company... sorry.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!