Matching Unicode letter characters in PCRE/PHP

后端 未结 5 992
别跟我提以往
别跟我提以往 2020-11-22 01:13

I\'m trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern:

// unicode letters, apostrop         


        
5条回答
  •  既然无缘
    2020-11-22 01:33

    First of all, your life would be a lot easier if you'd use single apostrophes instead of double quotes when writing these -- you need only one backslash. Second, combining marks \pM should also be included. If you find a character not matched please find out its Unicode code point and then you can use http://www.fileformat.info/info/unicode/ to figure out where it is. I found http://hsivonen.iki.fi/php-utf8/ an invaluable tool when doing debugging with UTF-8 properties (don't forget to convert to hex before trying to look up: array_map('dechex', utf8ToUnicode($text))).

    For example, Ă turns out to be http://www.fileformat.info/info/unicode/char/0102/index.htm and to be in Lu and so L should match it and it does match for me. The other character is http://www.fileformat.info/info/unicode/char/5f20/index.htm and is also isLetter and indeed matches for me. Do you have the Unicode character tables compiled in?

提交回复
热议问题