Matching Unicode letter characters in PCRE/PHP

后端 未结 5 988
别跟我提以往
别跟我提以往 2020-11-22 01:13

I\'m trying to write a reasonably permissive validator for names in PHP, and my first attempt consists of the following pattern:

// unicode letters, apostrop         


        
相关标签:
5条回答
  • 2020-11-22 01:28

    Anyone else looking here and not getting this to work, please note that /u will not produce consistent result with Unicode scripts across different PHP versions.

    See example: https://3v4l.org/4hB9e

    Related: Incosistent regex result for Thai characters across different PHP version

    0 讨论(0)
  • 2020-11-22 01:33

    First of all, your life would be a lot easier if you'd use single apostrophes instead of double quotes when writing these -- you need only one backslash. Second, combining marks \pM should also be included. If you find a character not matched please find out its Unicode code point and then you can use http://www.fileformat.info/info/unicode/ to figure out where it is. I found http://hsivonen.iki.fi/php-utf8/ an invaluable tool when doing debugging with UTF-8 properties (don't forget to convert to hex before trying to look up: array_map('dechex', utf8ToUnicode($text))).

    For example, Ă turns out to be http://www.fileformat.info/info/unicode/char/0102/index.htm and to be in Lu and so L should match it and it does match for me. The other character is http://www.fileformat.info/info/unicode/char/5f20/index.htm and is also isLetter and indeed matches for me. Do you have the Unicode character tables compiled in?

    0 讨论(0)
  • 2020-11-22 01:37

    I think the problem is much simpler than that: You forgot to specify the u modifier. The Unicode character properties are only available in UTF-8 mode.

    Your regex should be:

    // unicode letters, apostrophe, hyphen, space
    $namePattern = '/^[-\' \p{L}]+$/u';
    
    0 讨论(0)
  • 2020-11-22 01:43

    If you want to replace Unicode old pattern with new pattern you should write:

    $text = preg_replace('/\bold pattern\b/u', 'new pattern', $text);
    

    So the key here is u modifier

    Note : Your server php version shoud be at least PHP 4.3.5

    as mentioned here php.net | Pattern Modifiers

    u (PCRE_UTF8) This modifier turns on additional functionality of PCRE that is incompatible with Perl. Pattern strings are treated as UTF-8. This modifier is available from PHP 4.1.0 or greater on Unix and from PHP 4.2.3 on win32. UTF-8 validity of the pattern is checked since PHP 4.3.5.

    Thanks AgreeOrNot who give me that key here preg_replace match whole word in arabic

    I tried it and it worked in localhost but when I try it in remote server it didn't work, then I found that php.net start use u modifier in PHP 4.3.5. , I upgrade php version and it works

    Its important to know that this method is very helpful for Arabic users (عربي) because - as I believe - unicode is the best encode for arabic language, and replacement will not work if you don't use the u modifier, see next example it should work with you

    $text = preg_replace('/\bمرحبا بك\b/u', 'NEW', $text);

    0 讨论(0)
  • 2020-11-22 01:48
    <?php preg_match('/[a-zığüşöç]/u',$title)  ?>
    
    0 讨论(0)
提交回复
热议问题