regex to also match accented characters

只愿长相守 提交于 2019-12-22 08:11:04

问题


I have the following PHP code:

$search = "foo bar que";
$search_string = str_replace(" ", "|", $search);

$text = "This is my foo text with qué and other accented characters.";
$text = preg_replace("/$search_string/i", "<b>$0</b>", $text);

echo $text;

Obviously, "que" does not match "qué". How can I change that? Is there a way to make preg_replace ignore all accents?

The characters that have to match (Spanish):

á,Á,é,É,í,Í,ó,Ó,ú,Ú,ñ,Ñ

I don't want to replace all accented characters before applying the regex, because the characters in the text should stay the same:

"This is my foo text with qué and other accented characters."

and not

"This is my foo text with que and other accented characters."


回答1:


$search = str_replace(
   ['a','e','i','o','u','ñ'],
   ['[aá]','[eé]','[ií]','[oó]','[uú]','[nñ]'],
   $search)

This and the same for upper case will complain your request. A side note: ñ replacemet sounds invalid to me, as 'niño' is totaly diferent from 'nino'




回答2:


If you want to use the captured text in the replacement string, you have to use character classes in your $search variable (anyway, you set it manually):

$search = "foo bar qu[eé]"

And so on.




回答3:


The solution I finally used:

$search_for_preg = str_ireplace(["e","a","o","i","u","n"],
                                ["[eé]","[aá]","[oó]","[ií]","[uú]","[nñ]"],
                                $search_string);

$text = preg_replace("/$search_for_preg/iu", "<b>$0</b>", $text)."\n";



回答4:


You could try defining an array like this:

$vowel_replacements = array(
    "e" => "eé",
    // Other letters mapped to their other versions
);

Then, before your preg_match call, do something like this:

foreach ($vowel_replacements as $vowel => $replacements) {
    str_replace($search_string, "$vowel", "[$replacements]");
}

If I'm remembering my PHP right, that should replace your vowels with a character class of their accented forms -- which will keep it in place. It also lets you change the search string far more easily; you don't have to remember to replaced the vowels with their character classes. All you have to remember is to use the non-accented form in your search string.

(If there's some special syntax I'm forgetting that does this without a foreach, please comment and let me know.)



来源:https://stackoverflow.com/questions/30259360/regex-to-also-match-accented-characters

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!