Replace diacritic characters with “equivalent” ASCII in PHP?

前端 未结 4 1048
情深已故
情深已故 2021-01-31 20:42

Related questions:

  1. How to replace characters in a java String?
  2. How to replace special characters with their equivalent (such as " á " for &
相关标签:
4条回答
  • 2021-01-31 20:58

    Try this:

    function normal_chars($string)
    {
        $string = htmlentities($string, ENT_QUOTES, 'UTF-8');
        $string = preg_replace('~&([a-z]{1,2})(acute|cedil|circ|grave|lig|orn|ring|slash|th|tilde|uml);~i', '$1', $string);
        $string = preg_replace(array('~[^0-9a-z]~i', '~-+~'), ' ', $string);
        return trim($string);
    }
    
    Examples:
    
    echo normal_chars('Álix----_Ãxel!?!?'); // Alix Axel
    echo normal_chars('áéíóúÁÉÍÓÚ'); // aeiouAEIOU
    echo normal_chars('üÿÄËÏÖÜŸåÅ'); // uyAEIOUYaA
    

    Based on the selected answer in this thread: URL Friendly Username in PHP?

    0 讨论(0)
  • 2021-01-31 21:11

    The iconv module can do this, more specifically, the iconv() function:

    $str = iconv('Windows-1252', 'ASCII//TRANSLIT//IGNORE', "Gracišce");
    echo $str;
    //outputs "Gracisce"
    

    The main hassle with iconv is that you just have to watch your encodings, but it's definitely the right tool for the job (I used 'Windows-1252' for the example due to limitations of the text editor I was working with ;) The feature of iconv that you definitely want to use is the //TRANSLIT flag, which tells iconv to transliterate any characters that don't have an ASCII match into the closest approximation.

    0 讨论(0)
  • 2021-01-31 21:11

    I found another solution, based on @zombat's answer.

    The issue with his answer was that I was getting:

    Notice: iconv() [function.iconv]: Wrong charset, conversion from `UTF-8' to `ASCII//TRANSLIT//IGNORE' is not allowed in D:\www\phpcommand.php(11) : eval()'d code on line 3
    

    And after removing //IGNORE from the function, I got:

    Gr'a'e~a~o^O"ucisce
    

    So, the š character was translated correctly, but the other characters weren't.

    The solution that worked for me is a mix between preg_replace (to remove everything but [a-zA-Z0-9] - including spaces) and @zombat's solution:

    preg_replace('/[^a-zA-Z0-9.]/','',iconv('UTF-8', 'ASCII//TRANSLIT', "GráéãõÔücišce"));
    

    Output:

    GraeaoOucisce
    
    0 讨论(0)
  • 2021-01-31 21:13

    My solution is to create two strings - first with not wanted letters and second with letters that will replace firsts.

    $from = 'čšć';
    $to   = 'csc';
    $text = 'Gračišće';
    
    $result = str_replace(str_split($from), str_split($to), $text);
    
    0 讨论(0)
提交回复
热议问题