How can I match Korean characters in a Ruby regular expression?

问题

I have some basic validations for usernames using regular expressions, something like [\w-_]+, and I want to add support for Korean alphabet, while still keeping the validation the same.

I don't want to allow special characters, such as {}[]!@#$%^&*() etc., I just want to replace the \w with something that matches a given alphabet in addition to [a-zA-Z0-9].

Which means username like 안녕 should be valid, but not 안녕[].

I need to do this in Ruby 1.9.

回答1:

You can test for invalid characters like this:

#encoding: utf-8
def valid_name?(name)
  !name.match(/[^a-zA-Z0-9\p{Hangul}]/)
end

ar = %w(안녕 name 안녕[].)
ar.each{|name| puts "#{name} is #{valid_name?(name) ? "valid" : "invalid"}."}
# 안녕 is valid.
# name is valid.
# 안녕[]. is invalid.

回答2:

try this:

[가-힣]+

This matches every character from U+AC00 to U+D7A3, which is probably enough for your interest. (I don't think you'll need old hangul characters and stuff)

回答3:

I think you can replace \w by [:word:]

/^[[:word:]\-_]+$/ should work

回答4:

Matching for invalid characters is your best option, because there are way too many valid Korean characters - it's technically an alphabet but computerized as one-character-per-syllable, and additionally there are thousands of Chinese loan characters (Hanja) which should also be valid.

来源：https://stackoverflow.com/questions/10139996/how-can-i-match-korean-characters-in-a-ruby-regular-expression

标签

ruby

regex

unicode

ruby-1.9

cjk

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!