问题
I have a member search function where you can give parts of names and the return should be all members having at least one of username, firstname or lastname matching that input. The problem here is that some names have 'weird' characters like the é
in Renée
and the user doesn't wanna type the weird character but the normal ASCII substitute e
.
In PHP I convert the input string to ASCII using iconv (just in case someone types weird characters). In the database however I should also convert the weird chars to ASCII (obviously) for the strings to match.
I tried the following:
SELECT
CONVERT(_latin1'Renée' USING ascii) t1,
CAST(_latin1'Renée' AS CHAR CHARACTER SET ASCII) t2;
(That's two tries.) Both don't work. Both have Ren?e
as output. The question mark should be an e
. It's alright if it outputs Ren?ee
since I can just remove all question marks after the convert.
As you can imagine, the columns I want to query are encoded Latin1.
Thanks.
回答1:
You don't need to convert anything. Your requirement is to compare two strings and ask if they're equal, ignoring accents; the database server can use a collation to do that for you:
Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 1 | 1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
回答2:
First off, it should work this way:
SELECT * FROM `test` WHERE `name` COLLATE utf8_general_ci LIKE '%renee%';
Where the test
table is:
+-----+--------+
| id | name |
+-----+--------+
| 1 | Renée |
| 2 | Renêe |
| 3 | Renee |
+-----+--------+
What is your MySQL version, and how do you try to match things?
One of the other possible solutions is transliteration.
Related: PHP Transliteration
Transliterating the input should not be a problem, but transliterating the values from the permanent storage (e.g. db) real-time during the search may not be feasible. So you can add three more fields like: username_slug
, firstname_slug
and lastname_slug
. When inserting/modifying a record, set the slug values appropriately. And when searching, search the transliterated input against that slug fields.
+------+----------+---------------+----------+---------------+ ...
| id | username | username_slug | lastname | lastname_slug | ...
+------+----------+---------------+----------+---------------+ ...
| 1 | Renée | renee | La Niña | la-nina | ...
| 2 | Renêe | renee | ... | ... | ...
| 3 | Renee | renee | ... | ... | ...
+------+----------+---------------+----------+---------------+ ...
A search for "renee" or "renèe" would match all of the records.
As a side effect, you may be able to use that fields for generating SEF (search engine friendly) links, hence they are named ,..._slug
, e.g. example.com/users/renee. Of course, in that case you should check for the uniqueness of the slug field.
回答3:
@vincebowdren answer above works, I'm just adding this as an answer for formatting purposes:
CREATE TABLE `members` (
`id` int(11) DEFAULT NULL,
`lastname` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL
);
insert into members values (1, 'test6ë');
select id from members where lastname like 'test6e%';
Yields
+------+ | id | +------+ | 1 | +------+
And using Latin1,
set names latin1;
CREATE TABLE `members2` (
`id` int(11) DEFAULT NULL,
`lastname` varchar(20) CHARACTER SET latin1 DEFAULT NULL
);
insert into members2 values (1, 'Renée');
select id from members2 where lastname like '%Renee%';
will yield:
+------+ | id | +------+ | 1 | +------+
Of course, the OP should have the same charset in the application (PHP), connection (MySQL on Linux used to default to latin1 in 5.0, but defaults to UTF8 in 5.1), and in the field datatype to have less unknowns. Collations take care of the rest.
EDIT: I wrote should to have a better control over everything, but the following also works:
set names latin1;
select id from members where lastname like 'test6ë%';
Because, once the connection charset is set, MySQL does the conversion internally. In this case, it will convert somehow convert and compare the UTF8 string (from DB) to the latin1 (from query).
EDIT 2: Some skepticism requires me to provide an even more convincing example:
Given the statements above, here what I did more. Make sure the terminal is in UTF8.
set names utf8;
insert into members values (5, 'Renée'), (6, 'Renêe'), (7, 'Renèe');
select members.id, members.lastname, members2.id, members2.lastname
from members inner join members2 using (lastname);
Remember that members
is in utf8 and members2
is in latin1.
+------+----------+------+----------+ | id | lastname | id | lastname | +------+----------+------+----------+ | 5 | Renée | 1 | Renée | | 6 | Renêe | 1 | Renée | | 7 | Renèe | 1 | Renée | +------+----------+------+----------+
which proves with the correct settings, the collation does the work for you.
回答4:
The CAST()
operator in the context of character encodings translates from one method of character storage to another — it does not change the actual characters, which is what you are after. An é character is what it is in any character set, it is not an e. You need to convert accented characters to non-accented characters, which is a different issue and has been asked a number of times previously (normalizing accented characters in MySQL queries).
I am unsure if there is a way to do this directly in MySQL, short of having a translation table and going through letter by letter. It would most likely be easier to write a PHP script to go through the database and make the translations.
来源:https://stackoverflow.com/questions/4234010/how-do-i-convert-a-column-to-ascii-on-the-fly-without-saving-to-check-for-matche