How to match unicode words with ruby 1.9?

前端 未结 3 2008
一生所求
一生所求 2021-02-03 11:25

I\'m using ruby 1.9 and trying to find out which regex I need to make this true:

Encoding.default_internal = Encoding.default_external = \'utf-8\'
\"föö\".match(         


        
相关标签:
3条回答
  • 2021-02-03 12:02

    http://www.ruby-forum.com/topic/208777

    and

    http://www.ruby-forum.com/topic/210770

    might have clues for you.

    You can also use the (documented) \p{L} property, ex:

    $ ruby -ve "p '℉üüü' =~ /\p{L}/"
    ruby 1.9.2p0 (2010-08-18 revision 29036) [x86_64-linux]
    1
    
    0 讨论(0)
  • 2021-02-03 12:17

    You can manually turn on Unicode matching using the inside (?u) syntax:

    "föö".match(/(?u)(\w+)/)[1] == "föö"
    # => true
    

    However, using Unicode Property Syntax (steenslag's answer) or POSIX Brackets Syntax is better style, since they both automatically respect Unicode codepoints:

    "föö".match(/(\p{word}+)/)[1] == "föö"
    # => true
    
    "föö".match(/([[:word:]]+)/)[1] == "föö"
    # => true
    

    See this blog post for more info about matching Unicode characters in Ruby regexes.

    0 讨论(0)
  • 2021-02-03 12:20
    # encoding=utf-8 
    p "föö".match(/\p{Word}+/)[0] == "föö"
    
    0 讨论(0)
提交回复
热议问题