When to use utf-8 and when to use latin1 in MySQL?

可紊 提交于 2019-12-21 03:57:31

问题


I know that MySQL has default of latin1 encoding and apparently it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

I am working on a site that I hope will be used globally. Do I absolutely need to have utf-8? Or will I be able to get away with using latin1?

Also, I tried to change some tables from latin1 to utf8 but I got this error: Speficief key was too long; max key length is 1000 bytes Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

Thanks, Alex


回答1:


it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

It takes 1 bytes to store a latin1 character and 1 to 3 bytes to store a UTF8 character.

If you only use basic latin characters and punctuation in your strings (0 to 128 in Unicode), both charsets will occupy the same length.

Also, I tried to change some tables from latin1 to utf8 but I got this error: "Speficief key was too long; max key length is 1000 bytes" Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

If you have a column of VARCHAR(334) or longer, MyISAM wont't let you create an index on it since there is remote possibility of the column to occupy more that 1000 bytes.

Note that keys of such length are rarely useful. You can create a prefixed index which will be almost as selective for any real-world data.




回答2:


At a bare minimum I would suggest using UTF-8. Your data will be compatible with every other database out there nowadays since 90%+ of them are UTF-8.

If you go with LATIN1/ISO-8859-1 you risk the data being not properly stored because it doesn't support international characters... so you might run into something like the left side of this image:

If you go with UTF-8, you don't need to deal with these headaches.

Regarding your error, it sounds like you need to optimize your database. Consider this: http://bugs.mysql.com/bug.php?id=4541#c284415

It would help if you gave specifics on your table schema and column for that issue.




回答3:


If you allow users to post in their own languages, and if you want users from all countries to participate, you have to switch at least the tables containing those posts to UTF-8 - Latin1 covers only ASCII and western European characters. The same is true if you intend to use multiple languages for your UI. See this post for how to handle migration.




回答4:


In my experience, if you plan to support Arabic, Russian, Asian languages or others, the investment in UTF-8 support upfront will pay off down the line. However, depending on your circumstances you may be able to get away with English for a while.

As for the error, you probably have a key or index field with more than 333 characters, the maximum allowed in MySQL with UTF-8 encoding. See this bug report.




回答5:


We did an application using Latin because it was the default. But later on we had to change everything to UTF because of spanish characters, not incredible difficult but no point having to change things unnecessarily.

So short answer is just go with UTF-8 from the beginning, it will save you trouble later on.




回答6:


Since the max length of a key is 1000 BYTES, if you use utf8, then this will limmit you to 333 characters.

However MySQL is different form Oracle for charset. In Oracle you can't have a different character set per column, wheras in MySQL you can, so may be you can set the key to latin1 and other columns to utf8.

Finally I believe only defunct version 6.0alpha (ditched when Sun bought MySQL) could accomodate unicode characters beyound the BMP (Basic Multilingual Plan). So basically, even with UTF-8, you won't have all the whole unicode character set. In practice this is only a problem for rare Chinese characters, if that really matters to you.




回答7:


I am not an expert, but I always understood that UTF-8 is actually a 4-byte wide encoding set, not 3. And as I understand it, the MySQL implementation of utf8_unicode_ci only handles a 3-byte wide encoding set...

If you want the full UTF-8 4-byte character encoding, you need to use utf8mb4_unicode_ci encoding for your MySQL database/tables.



来源:https://stackoverflow.com/questions/4857778/when-to-use-utf-8-and-when-to-use-latin1-in-mysql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!