How to amend sub strings?

后端未结

关注

 2  1214

Using collation xxx_german2_ci which treats ü and ue as identical, is it possible to have all occurences of München be hi

相关标签:

2条回答

天命终不由人

2021-01-17 02:43

In the end I decided to do it all in PHP, therefore my question about which characters are equal with utf8_general_ci.

Below is what I came up with, by example: A label is constructed from a text $description, with sub strings $term highlighted, and special characters converted. Substitution is not complete, but probably sufficient for the actual use case.

mb_internal_encoding("UTF-8");

function withoutAccents($s) {
    return strtr(utf8_decode($s),
                 utf8_decode('àáâãäçèéêëìíîïñòóôõöùúûüýÿß'),
                 'aaaaaceeeeiiiinooooouuuuyys');
}

function simplified($s) {
    return withoutAccents(strtolower($s));
}

function encodedSubstr($s, $start, $length) {
    return htmlspecialchars(mb_substr($s, $start, $length));
}

function labelFromDescription($description, $term) {
    $simpleTerm = simplified($term);
    $simpleDescription = simplified($description);

    $lastEndPos = $pos = 0;
    $termLen = strlen($simpleTerm);
    $label = ''; // HTML
    while (($pos = strpos($simpleDescription,
                          $simpleTerm, $lastEndPos)) !== false) {
        $label .=
            encodedSubstr($description, $lastEndPos, $pos - $lastEndPos).
            '<strong>'.
            encodedSubstr($description, $pos, $termLen).
            '</strong>';
        $lastEndPos = $pos + $termLen;
    }
    $label .= encodedSubstr($description, $lastEndPos,
                            strlen($description) - $lastEndPos);

    return $label;
}

echo labelFromDescription('São Paulo <SAO>', 'SAO')."\n";
echo labelFromDescription('München <MUC>', 'ünc');

Output:

<strong>São</strong> Paulo &lt;<strong>SAO</strong>&gt;
M<strong>ünc</strong>hen &lt;MUC&gt;

0 讨论(0)

南方客

2021-01-17 03:00

I have found this tables: http://developer.mimer.com/collations/charts/index.tml. They are, of course, landuage dependant. Collation is just comapring algorithm. For general utf8 I am not sure, how it treats special characters.

You can use them to found desired symbols and replace them in output to get same result as in example. But for those, you will need some programming language (PHP or anything else).

Another resources:

http://collation-charts.org/

http://mysql.rjweb.org/doc.php/charcoll (down on the page)

Basicly, try to google "collation algorithm mysql utf8_general_ci" or something like this

0 讨论(0)
发布评论:

提交评论
- 加载中...