How to amend sub strings?

后端 未结 2 1212
北海茫月
北海茫月 2021-01-17 02:25

Using collation xxx_german2_ci which treats ü and ue as identical, is it possible to have all occurences of München be hi

相关标签:
2条回答
  • 2021-01-17 02:43

    In the end I decided to do it all in PHP, therefore my question about which characters are equal with utf8_general_ci.

    Below is what I came up with, by example: A label is constructed from a text $description, with sub strings $term highlighted, and special characters converted. Substitution is not complete, but probably sufficient for the actual use case.

    mb_internal_encoding("UTF-8");
    
    function withoutAccents($s) {
        return strtr(utf8_decode($s),
                     utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿß'),
                     'aaaaaceeeeiiiinooooouuuuyys');
    }
    
    function simplified($s) {
        return withoutAccents(strtolower($s));
    }
    
    function encodedSubstr($s, $start, $length) {
        return htmlspecialchars(mb_substr($s, $start, $length));
    }
    
    function labelFromDescription($description, $term) {
        $simpleTerm = simplified($term);
        $simpleDescription = simplified($description);
    
        $lastEndPos = $pos = 0;
        $termLen = strlen($simpleTerm);
        $label = ''; // HTML
        while (($pos = strpos($simpleDescription,
                              $simpleTerm, $lastEndPos)) !== false) {
            $label .=
                encodedSubstr($description, $lastEndPos, $pos - $lastEndPos).
                '<strong>'.
                encodedSubstr($description, $pos, $termLen).
                '</strong>';
            $lastEndPos = $pos + $termLen;
        }
        $label .= encodedSubstr($description, $lastEndPos,
                                strlen($description) - $lastEndPos);
    
        return $label;
    }
    
    echo labelFromDescription('São Paulo <SAO>', 'SAO')."\n";
    echo labelFromDescription('München <MUC>', 'ünc');
    

    Output:

    <strong>São</strong> Paulo &lt;<strong>SAO</strong>&gt;
    M<strong>ünc</strong>hen &lt;MUC&gt;
    
    0 讨论(0)
  • 2021-01-17 03:00

    I have found this tables: http://developer.mimer.com/collations/charts/index.tml. They are, of course, landuage dependant. Collation is just comapring algorithm. For general utf8 I am not sure, how it treats special characters.

    You can use them to found desired symbols and replace them in output to get same result as in example. But for those, you will need some programming language (PHP or anything else).

    Another resources:

    http://collation-charts.org/

    http://mysql.rjweb.org/doc.php/charcoll (down on the page)

    Basicly, try to google "collation algorithm mysql utf8_general_ci" or something like this

    0 讨论(0)
提交回复
热议问题