Convert 2 similarly-looking German characters of different kinds to same ASCII string in PHP

前端 未结 3 957
梦如初夏
梦如初夏 2021-01-07 00:24

I have these two strings:

$str1 = \'Ö\';
$str2 = \'Ö\';
$e1 = mb_detect_encoding($str1);
$e2 = mb_detect_encoding($str2);
var_dump($str1);
var_dump($str2);
         


        
相关标签:
3条回答
  • 2021-01-07 01:11

    Extending Andreas's answer. These characters are letter + combining diaeresis(U-0308). I was able to search and replace them to standard umlauts, then replace with whatever is needed. This is the fuction I've used to replace them:

    function convertToUmlauts($str) {
        $srp_array = ['Ö' => 'Ö', 'Ä' => 'Ä', 'Ü' => 'Ü', '̈a' => 'ä', 'ö' => 'ö', 'ü' => 'ü'];
        return strtr($str, $srp_array);
    }
    
    0 讨论(0)
  • 2021-01-07 01:16

    You could first convert your input to utf-8 using iconv and then apply your conversion to ASCII. To detect the current encoding you can use mb_detect_encoding.

    $aUTF8 = iconv(mb_detect_encoding($a, 'UTF-8, ISO-8859-1', true), 'UTF-8', $a);
    $bUTF8 = iconv(mb_detect_encoding($b, 'UTF-8, ISO-8859-1', true), 'UTF-8', $b);
    
    $aASCII = iconv("utf-8", "ascii//TRANSLIT", $aUTF8);
    $bASCII = iconv("utf-8", "ascii//TRANSLIT", $bUTF8);
    

    Please note that you might have to add additional encodings to the encoding list of mb_detect_encoding.

    0 讨论(0)
  • 2021-01-07 01:17

    These are two different forms to express the same letter in Unicode; one is the combination of an O with combining diereses, the other is the letter Ö. Unicode allows either variant to express "Ö".

    To normalize that into your preferred variant, use Normalizer::normalize:

    $str = Normalizer::normalize('Ö', Normalizer::FORM_C);
    

    Likely you want Form C, which will converge on "Ö" (the single letter form). If you prefer "O" + combining diereses, use Form D instead.

    0 讨论(0)
提交回复
热议问题