问题
Desired result :
Have an accent sensitive primary key in MySQL.
I have a table of unique words, so I use the word itself as a primary key (by the way if someone can give me an advice about it, I have no idea if it's a good design/practice or not).
I need that field to be accent (and why not case) sensitive, because it must distinguish between, for instance, 'demandé'
and 'demande'
, two different inflexions of the French verb "demander". I do not have any problem to store accented words in the database. I just can't insert two accented characters strings that are identical when unaccented.
Error :
When trying to create the 'demandé'
row with the following query:
INSERT INTO `corpus`.`token` (`name_token`) VALUES ('demandé');
I got this error :
ERROR 1062: 1062: Duplicate entry 'demandé' for key 'PRIMARY'
Questions :
- Where in the process should a make a modification in order to have two different unique primary keys for "demande" and "demandé" in that table ?
SOLUTION using 'collate utf8_general_ci' in table declaration
- How can i make accent sensitive queries ? Is the following the right way :
SELECT * FROM corpus.token WHERE name_token = 'demandé' COLLATE utf8_bin
SOLUTION using 'collate utf8_bin' with WHERE statement
I found that i can achieve this point by using the
BINARY
Keyword (see this sqlFiddle). What is the difference betweencollate
andbinary
?Can I preserve other tables from any changes ? (I'll have to rebuild that table anyway, because it's kind of messy)
I'm not very comfortable with encoding in MySQL. I don't have any problem yet with encoding in that database (and I'm kind of lucky because my data might not always use the same encoding... and there is not much I can do about it). I have a feeling that any modification regarding to that "accent sensitive" issue might create some encoding issue with other queries or data integrity. Am I right to be concerned?
Step by step :
Database creation :
CREATE DATABASE corpus DEFAULT CHARACTER SET utf8;
Table of unique words :
CREATE TABLE token (name_token VARCHAR(50), freq INTEGER, CONSTRAINT pk_token PRIMARY KEY (name_token))
Queries
SELECT * FROM corpus.token WHERE name_token = 'demande';
SELECT * FROM corpus.token WHERE name_token = 'demandé';
both returns the same row:
demande
回答1:
Collations. You have two choices, not three:
utf8_bin
treats all of these as different: demandé
and demande
and Demandé
.
utf8_..._ci
(typically utf8_general_ci
or utf8_unicode_ci
) treats all of these as the same: demandé
and demande
and Demandé
.
If you want only case sensitivity (demandé
= demande
, but neither match Demandé
), you are out of luck.
If you want only accent sensitivity (demandé
= Demandé
, but neither match demande
), you are out of luck.
Declaration. The best way to do whatever you pick:
CREATE TABLE (
name VARCHAR(...) CHARACTER SET utf8 COLLATE utf8_... NOT NULL,
...
PRIMARY KEY(name)
)
Don't change collation on the fly. This won't use the index (that is, will be slow) if the collation is different in name
:
WHERE name = ... COLLATE ...
BINARY. The datatypes BINARY
, VARBINARY
and BLOB
are very much like CHAR
, VARCHAR
, and TEXT
with COLLATE ..._bin
. Perhaps the only difference is that text will be checked for valid utf8 storing in a VARCHAR ... COLLATE ..._bin
, but it will not be checked when storing into VARBINARY...
. Comparisons (WHERE
, ORDER BY
, etc) will be the same; that is, simply compare the bits, don't do case folding or accent stripping, etc.
回答2:
May be you need this
_ci in a collation name=case insensitive
If your searches on that field are always going to be case-sensitive, then declare the collation of the field as utf8_bin... that'll compare for equality the utf8-encoded bytes.
col_name varchar(10) collate utf8_bin
If searches are normally case-insensitive, but you want to make an exception for this search, try;
WHERE col_name = 'demandé' collate utf8_bin
More here
回答3:
Try this
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE TABLE t1
-> (c1 CHAR(1) CHARACTER SET UTF8 COLLATE utf8_general_ci);
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO t1 VALUES ('a'),('A'),('À'),('á');
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT c1, HEX(c1), HEX(WEIGHT_STRING(c1)) FROM t1;
+------+---------+------------------------+
| c1 | HEX(c1) | HEX(WEIGHT_STRING(c1)) |
+------+---------+------------------------+
| a | 61 | 0041 |
| A | 41 | 0041 |
| À | C380 | 0041 |
| á | C3A1 | 0041 |
+------+---------+------------------------+
4 rows in set (0.00 sec)
来源:https://stackoverflow.com/questions/34387766/use-accent-senstive-primary-key-in-mysql