mySQL - matching latin (english) form input to utf8 (non-English) data

前端 未结 1 652
不知归路
不知归路 2021-01-15 21:54

I maintain a music database in mySQL, how do I return results stored under e.g. \'Tiësto\' when people search for \'Tiesto\'?

All the data is stored under full text

相关标签:
1条回答
  • 2021-01-15 22:25

    A possible solution would be creating another column in the database next to "artist", like "artist_normalized". Here, while populating the table, you could insert a "normalized" version of the string. Search can then be performed against the artist_normalized column.

    A test code:

    <?php
    $transliterator = Transliterator::createFromRules(':: NFD; :: [:Nonspacing Mark:] Remove; :: NFC;', Transliterator::FORWARD);
    $test = ['abcd', 'èe', '€', 'àòùìéëü', 'àòùìéëü', 'tiësto'];
    foreach($test as $e) {
        $normalized = $transliterator->transliterate($e);
        echo $e. ' --> '.$normalized."\n";
    }
    ?>
    

    Result:

    abcd --> abcd
    èe --> ee
    € --> €
    àòùìéëü --> aouieeu
    àòùìéëü --> aouieeu
    tiësto --> tiesto
    

    The magic is done by the Transliterator class. The specified rule performs three actions: decomposes the string, removes diacritics and then recomposes the string, canonicalized. Transliterator in PHP is built on top of ICU, so by doing this you're relying on the tables of the ICU library, which are complete and reliable.

    Note: this solution requires PHP 5.4 or greater with the intl extension.

    0 讨论(0)
提交回复
热议问题