PHP Multi Byte str_replace?

前端 未结 4 1671
遥遥无期
遥遥无期 2020-12-03 17:36

I\'m trying to do accented character replacement in PHP but get funky results, my guess being because i\'m using a UTF-8 string and str_replace can\'t properly handle multi-

相关标签:
4条回答
  • 2020-12-03 18:10

    It's possible to remove diacritics using Unicode normalization form D (NFD) and Unicode character properties.

    NFD converts something like the "ü" umlaut from "LATIN SMALL LETTER U WITH DIAERESIS" (which is a letter) to "LATIN SMALL LETTER U" (letter) and "COMBINING DIAERESIS" (not a letter).

    header('Content-Type: text/plain; charset=utf-8');
    
    $test = implode('', array('á','à','â','ã','ª','ä','å','Á','À','Â','Ã','Ä','é','è',
    'ê','ë','É','È','Ê','Ë','í','ì','î','ï','Í','Ì','Î','Ï','œ','ò','ó','ô','õ','º','ø',
    'Ø','Ó','Ò','Ô','Õ','ú','ù','û','Ú','Ù','Û','ç','Ç','Ñ','ñ'));
    
    $test = Normalizer::normalize($test, Normalizer::FORM_D);
    
    // Remove everything that's not a "letter" or a space (e.g. diacritics)
    // (see http://de2.php.net/manual/en/regexp.reference.unicode.php)
    $pattern = '/[^\pL ]/u';
    
    echo preg_replace($pattern, '', $test);
    

    Output:

    aaaaªaaAAAAAeeeeEEEEiiiiIIIIœooooºøØOOOOuuuUUUcCNn
    

    The Normalizer class is part of the PECL intl package. (The algorithm itself isn't very complicated but needs to load a lot of character mappings afaik. I wrote a PHP implementation a while ago.)

    (I'm adding this two months late because I think it's a nice technique that's not known widely enough.)

    0 讨论(0)
  • 2020-12-03 18:15

    According to php documentation str_replace function is binary-safe, which means that it can handle UTF-8 encoded text without any data loss.

    0 讨论(0)
  • 2020-12-03 18:20

    Try this function definition:

    if (!function_exists('mb_str_replace')) {
        function mb_str_replace($search, $replace, $subject) {
            if (is_array($subject)) {
                foreach ($subject as $key => $val) {
                    $subject[$key] = mb_str_replace((string)$search, $replace, $subject[$key]);
                }
                return $subject;
            }
            $pattern = '/(?:'.implode('|', array_map(create_function('$match', 'return preg_quote($match[0], "/");'), (array)$search)).')/u';
            if (is_array($search)) {
                if (is_array($replace)) {
                    $len = min(count($search), count($replace));
                    $table = array_combine(array_slice($search, 0, $len), array_slice($replace, 0, $len));
                    $f = create_function('$match', '$table = '.var_export($table, true).'; return array_key_exists($match[0], $table) ? $table[$match[0]] : $match[0];');
                    $subject = preg_replace_callback($pattern, $f, $subject);
                    return $subject;
                }
            }
            $subject = preg_replace($pattern, (string)$replace, $subject);
            return $subject;
        }
    }
    
    0 讨论(0)
  • 2020-12-03 18:27

    Looks like the string was not replaced because your input encoding and the file encoding mismatch.

    0 讨论(0)
提交回复
热议问题