How to determine if a character is a Chinese character

后端 未结 2 1049
悲&欢浪女
悲&欢浪女 2020-12-03 01:48

How to determine if a character is a Chinese character using ruby?

相关标签:
2条回答
  • 2020-12-03 02:43

    An interesting article on encodings in Ruby: http://blog.grayproductions.net/articles/bytes_and_characters_in_ruby_18 (it's part of a series - check the table of contents at the start of the article also)

    I haven't used chinese characters before but this seems to be the list supported by unicode: http://en.wikipedia.org/wiki/List_of_CJK_Unified_Ideographs . Also take note that it's a unified system including Japanese and Korean characters (some characters are shared between them) - not sure if you can distinguish which are Chinese only.

    I think you can check if it's a CJK character by calling this on string str and character with index n:

    def check_char(str, n)
      list_of_chars = str.unpack("U*")
      char = list_of_chars[n]
      #main blocks
      if char >= 0x4E00 && char <= 0x9FFF
        return true
      end
      #extended block A
      if char >= 0x3400 && char <= 0x4DBF
        return true
      end
      #extended block B
      if char >= 0x20000 && char <= 0x2A6DF
        return true
      end
      #extended block C
      if char >= 0x2A700 && char <= 0x2B73F
        return true
      end
      return false
    end
    
    0 讨论(0)
  • 2020-12-03 02:50

    Ruby 1.9

    #encoding: utf-8   
     "漢" =~ /\p{Han}/
    
    0 讨论(0)
提交回复
热议问题