MySQL matching unicode characters with ascii version

后端 未结 5 1629
盖世英雄少女心
盖世英雄少女心 2020-12-03 14:59

I\'m running MySQL 5.1.50 and have a table that looks like this:

organizations | CREATE TABLE `organizations` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `na         


        
相关标签:
5条回答
  • 2020-12-03 15:40

    I found out, that you get the requested result using REGEXP

    SELECT * FROM table WHERE name REGEXP 'namé';
    

    But this doesn't help if you try to group exactly by name.

    0 讨论(0)
  • 2020-12-03 15:45

    You specified the name column as text CHARACTER SET utf8 COLLATE utf8_unicode_ci which tells MySQL to consider e and é as equivalent in matching and sorting. That collation and utf8_general_ci both make a lot of things equivalent.

    http://www.collation-charts.org/ is a great resource once you learn how to read the charts, which is pretty easy.

    If you want e and é etc. to be considered different then you must choose a different collation. To find out what collations are on your server (assuming you're limited to UTF-8 encoding):

    mysql> show collation like 'utf8%';
    

    And choose using the collation charts as a reference.

    One more special collation is utf8_bin in which there are no equivalencies, it's a binary match.

    The only MySQL Unicode collations I'm aware of that are not language specific are utf8_unicode_ci, utf8_general_ci and utf8_bin. They are rather weird. The real purpose of a collation is to make the computer match and sort as a person from somewhere would expect. Hungarian and Turkish dictionaries have their entries ordered according to different rules. Specifying a collation allows you to sort and match according to such local rules.

    For example, it seems Danes consider e and é equivalent but Icelanders don't:

    mysql> select _utf8'e' collate utf8_danish_ci
        -> = _utf8'é' collate utf8_danish_ci as equal;
    +-------+
    | equal |
    +-------+
    |     1 |
    +-------+
    
    mysql> select _utf8'e' collate utf8_icelandic_ci
        -> = _utf8'é' collate utf8_icelandic_ci as equal;
    +-------+
    | equal |
    +-------+
    |     0 |
    +-------+
    

    Another handy trick is to fill a one column table with a bunch of characters you're interested in (it's easier from a script) and then MySQL can tell you the equivalencies:

    mysql> create table t (c char(1) character set utf8);
    mysql> insert into t values ('a'), ('ä'), ('á');
    mysql> select group_concat(c) from t group by c collate utf8_icelandic_ci;
    +-----------------+
    | group_concat(c) |
    +-----------------+
    | a               |
    | á               |
    | ä               |
    +-----------------+
    
    mysql> select group_concat(c) from t group by c collate utf8_danish_ci;
    +-----------------+
    | group_concat(c) |
    +-----------------+
    | a,á             |
    | ä               |
    +-----------------+
    
    mysql> select group_concat(c) from t group by c collate utf8_general_ci;
    +-----------------+
    | group_concat(c) |
    +-----------------+
    | a,ä,á           |
    +-----------------+
    
    0 讨论(0)
  • 2020-12-03 15:59

    one thing you can do with your query string is to decode it...

    < ?php
    $query="उनकी"; // some Unicode characters
    $query=urldecode($query);
    $qry= "SELECT * FROM table WHERE books LIKE '%$query%'";
    
    //rest of the code....
    ?>
    

    it worked for me. :)

    0 讨论(0)
  • 2020-12-03 16:00

    You have set collation to utf8_unicode_ci which equates accented latin characters. Additional information can be found here.

    0 讨论(0)
  • 2020-12-03 16:01

    Of course, this will work:

    SELECT * FROM table WHERE name LIKE BINARY 'namé';
    
    0 讨论(0)
提交回复
热议问题