When to use utf-8 and when to use latin1 in MySQL?

后端 未结 8 1682
暖寄归人
暖寄归人 2021-02-13 21:56

I know that MySQL has default of latin1 encoding and apparently it takes 1 byte to store a character in latin1 and 3 bytes to store a character in

相关标签:
8条回答
  • 2021-02-13 22:05

    it takes 1 byte to store a character in latin1 and 3 bytes to store a character in utf-8 - is that correct?

    It takes 1 bytes to store a latin1 character and 1 to 3 bytes to store a UTF8 character.

    If you only use basic latin characters and punctuation in your strings (0 to 128 in Unicode), both charsets will occupy the same length.

    Also, I tried to change some tables from latin1 to utf8 but I got this error: "Speficief key was too long; max key length is 1000 bytes" Does anyone know the solution to this? And should I really solve that or may latin1 be enough?

    If you have a column of VARCHAR(334) or longer, MyISAM wont't let you create an index on it since there is remote possibility of the column to occupy more that 1000 bytes.

    Note that keys of such length are rarely useful. You can create a prefixed index which will be almost as selective for any real-world data.

    0 讨论(0)
  • 2021-02-13 22:09

    Since the max length of a key is 1000 BYTES, if you use utf8, then this will limmit you to 333 characters.

    However MySQL is different form Oracle for charset. In Oracle you can't have a different character set per column, wheras in MySQL you can, so may be you can set the key to latin1 and other columns to utf8.

    Finally I believe only defunct version 6.0alpha (ditched when Sun bought MySQL) could accomodate unicode characters beyound the BMP (Basic Multilingual Plan). So basically, even with UTF-8, you won't have all the whole unicode character set. In practice this is only a problem for rare Chinese characters, if that really matters to you.

    0 讨论(0)
  • 2021-02-13 22:10

    Current best practice is to never use MySQL's utf8 character set. Use utf8mb4 instead, which is a proper implementation of the standard.

    See Adam Hooper's Explanation for more detail.

    Note that in utf8mb4, characters have a variable number of bytes. As the name implies, characters are up to four bytes. For characters in the the latin character set, encoded as utf8mb4, they still occupy only one byte. Other characters, including those with accents, Kanji, and emoji's require two, three, or four bytes to store.

    The Specified key was too long; max key length is 1000 bytes error occurs when an index contains columns in utf8mb4 because the index may be over this limit. You'll need to shorten the column length of some character columns or shorten the length of the index on the columns using this syntax to ensure that it is shorter than the limit.

    ALTER TABLE.. ADD INDEX `myIndex` ( column1(15), column2(200) );

    0 讨论(0)
  • 2021-02-13 22:17

    I am not an expert, but I always understood that UTF-8 is actually a 4-byte wide encoding set, not 3. And as I understand it, the MySQL implementation of utf8_unicode_ci only handles a 3-byte wide encoding set...

    If you want the full UTF-8 4-byte character encoding, you need to use utf8mb4_unicode_ci encoding for your MySQL database/tables.

    0 讨论(0)
  • 2021-02-13 22:19

    If you allow users to post in their own languages, and if you want users from all countries to participate, you have to switch at least the tables containing those posts to UTF-8 - Latin1 covers only ASCII and western European characters. The same is true if you intend to use multiple languages for your UI. See this post for how to handle migration.

    0 讨论(0)
  • 2021-02-13 22:21

    We did an application using Latin because it was the default. But later on we had to change everything to UTF because of spanish characters, not incredible difficult but no point having to change things unnecessarily.

    So short answer is just go with UTF-8 from the beginning, it will save you trouble later on.

    0 讨论(0)
提交回复
热议问题