I have a member search function where you can give parts of names and the return should be all members having at least one of username, firstname or lastname matching that i
You don't need to convert anything. Your requirement is to compare two strings and ask if they're equal, ignoring accents; the database server can use a collation to do that for you:
Non-UCA collations have a one-to-one mapping from character code to weight. In MySQL, such collations are case insensitive and accent insensitive. utf8_general_ci is an example: 'a', 'A', 'À', and 'á' each have different character codes but all have a weight of 0x0041 and compare as equal.
mysql> SET NAMES 'utf8' COLLATE 'utf8_general_ci';
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT 'a' = 'A', 'a' = 'À', 'a' = 'á';
+-----------+-----------+-----------+
| 'a' = 'A' | 'a' = 'À' | 'a' = 'á' |
+-----------+-----------+-----------+
| 1 | 1 | 1 |
+-----------+-----------+-----------+
1 row in set (0.06 sec)
@vincebowdren answer above works, I'm just adding this as an answer for formatting purposes:
CREATE TABLE `members` (
`id` int(11) DEFAULT NULL,
`lastname` varchar(20) COLLATE utf8_unicode_ci DEFAULT NULL
);
insert into members values (1, 'test6ë');
select id from members where lastname like 'test6e%';
Yields
+------+ | id | +------+ | 1 | +------+
And using Latin1,
set names latin1;
CREATE TABLE `members2` (
`id` int(11) DEFAULT NULL,
`lastname` varchar(20) CHARACTER SET latin1 DEFAULT NULL
);
insert into members2 values (1, 'Renée');
select id from members2 where lastname like '%Renee%';
will yield:
+------+ | id | +------+ | 1 | +------+
Of course, the OP should have the same charset in the application (PHP), connection (MySQL on Linux used to default to latin1 in 5.0, but defaults to UTF8 in 5.1), and in the field datatype to have less unknowns. Collations take care of the rest.
EDIT: I wrote should to have a better control over everything, but the following also works:
set names latin1;
select id from members where lastname like 'test6ë%';
Because, once the connection charset is set, MySQL does the conversion internally. In this case, it will convert somehow convert and compare the UTF8 string (from DB) to the latin1 (from query).
EDIT 2: Some skepticism requires me to provide an even more convincing example:
Given the statements above, here what I did more. Make sure the terminal is in UTF8.
set names utf8;
insert into members values (5, 'Renée'), (6, 'Renêe'), (7, 'Renèe');
select members.id, members.lastname, members2.id, members2.lastname
from members inner join members2 using (lastname);
Remember that members
is in utf8 and members2
is in latin1.
+------+----------+------+----------+ | id | lastname | id | lastname | +------+----------+------+----------+ | 5 | Renée | 1 | Renée | | 6 | Renêe | 1 | Renée | | 7 | Renèe | 1 | Renée | +------+----------+------+----------+
which proves with the correct settings, the collation does the work for you.
The CAST()
operator in the context of character encodings translates from one method of character storage to another — it does not change the actual characters, which is what you are after. An é character is what it is in any character set, it is not an e. You need to convert accented characters to non-accented characters, which is a different issue and has been asked a number of times previously (normalizing accented characters in MySQL queries).
I am unsure if there is a way to do this directly in MySQL, short of having a translation table and going through letter by letter. It would most likely be easier to write a PHP script to go through the database and make the translations.
First off, it should work this way:
SELECT * FROM `test` WHERE `name` COLLATE utf8_general_ci LIKE '%renee%';
Where the test
table is:
+-----+--------+
| id | name |
+-----+--------+
| 1 | Renée |
| 2 | Renêe |
| 3 | Renee |
+-----+--------+
What is your MySQL version, and how do you try to match things?
One of the other possible solutions is transliteration.
Related: PHP Transliteration
Transliterating the input should not be a problem, but transliterating the values from the permanent storage (e.g. db) real-time during the search may not be feasible. So you can add three more fields like: username_slug
, firstname_slug
and lastname_slug
. When inserting/modifying a record, set the slug values appropriately. And when searching, search the transliterated input against that slug fields.
+------+----------+---------------+----------+---------------+ ...
| id | username | username_slug | lastname | lastname_slug | ...
+------+----------+---------------+----------+---------------+ ...
| 1 | Renée | renee | La Niña | la-nina | ...
| 2 | Renêe | renee | ... | ... | ...
| 3 | Renee | renee | ... | ... | ...
+------+----------+---------------+----------+---------------+ ...
A search for "renee" or "renèe" would match all of the records.
As a side effect, you may be able to use that fields for generating SEF (search engine friendly) links, hence they are named ,..._slug
, e.g. example.com/users/renee. Of course, in that case you should check for the uniqueness of the slug field.