How do you specify a regex character range that will work in European languages other than English?

后端 未结 2 1274
南旧
南旧 2021-01-12 20:00

I\'m working with Ruby\'s regex engine. I need to write a regex that does this

WIKI_WORD = /\\b([a-z][\\w_]+\\.)?[A-Z][a-z]+[A-Z]\\w*\\b/

b

2条回答
  •  星月不相逢
    2021-01-12 20:31

    WIKI_WORD = /\b(\p{Ll}\w+\.)?\p{Lu}\p{Ll}+\p{Lu}\w*\b/u
    

    should work in Ruby 1.9. \p{Lu} and \p{Ll} are shorthands for uppercase and lowercase Unicode letters. (\w already includes the underscore)

    See also this answer - you might need to run Ruby in UTF-8 mode for this to work, and possibly your script must be encoded in UTF-8, too.

提交回复
热议问题