Choosing data type for MySQL?

后端 未结 3 990
难免孤独
难免孤独 2021-01-12 09:28

I have been researching and reading about SQL data types for a few days now (I know... I know, that\'s not very long) and one of the things that is hard for me to grasp is h

相关标签:
3条回答
  • 2021-01-12 10:05

    With regards to BLOB vs TEXT (since that's the only concrete question in your post): BLOB is for binary data, and TEXT is intended for text data.

    It's usually pretty simple to use the most specific type of column that fits your needs, and fallback to the generic types if none of them fit your usage.

    0 讨论(0)
  • 2021-01-12 10:26

    MySQL string types come in two variants: One without a character set label and one with a character set label.

    A fixed length string, padded with spaces at the end, is CHAR(n). The matching type that has no character set label is BINARY(n). Storing the string "hello" in a CHAR(255) CHARSET utf8 will take 765 bytes (string padded with spaces to full length, stored as utf8 which as a worst case space usage of 3 bytes/character allocates 3*255 bytes).

    A variable length string with one or two length bytes and no padding is VARCHAR((n). The matching type that has no character set label is VARBINARY(n). Storing the string "hello" in a VARCHAR(255) CHARSET utf8 will take 6 bytes (1 length byte plus 5 bytes for the actual text). Storing the string クリス in the same type will take 10 bytes (1 length byte plus 3 characters using 3 bytes per character to represent them).

    mysql> select hex('クリス'), length(hex('クリス'))/2 as bytes;
    +--------------------+--------+
    | hex('クリス')      | bytes  |
    +--------------------+--------+
    | E382AFE383AAE382B9 | 9.0000 |
    +--------------------+--------+
    1 row in set (0.02 sec)
    

    A variable length string with one, two, three or four length bytes is TINYTEXT, TEXT, MEDIUMTEXT and LARGETEXT. The matching types that have no character set label are TINYBLOB, BLOB, MEDIUMBLOB and LARGEBLOB.

    A TEXT/BLOB-like type differs from a VARCHAR/VARBINARY-like type in how and where the data is stored, see http://www.mysqlperformanceblog.com/2010/02/09/blob-storage-in-innodb/ for details on how TEXT/BLOB-like types are stored in InnoDB depending on version and ROW_FORMAT settings. For performance reasons, you want a recent version of InnoDB and "Barracuda"-Format tables.

    MySQL is incapable of working with any data that is larger than max_allowed_packet (default: 1M) in size, unless you construct complicated and memory intense workarounds at the server side. This further restricts what can be done with TEXT/BLOB-like types, and generally makes the LARGETEXT/LARGEBLOB type useless in a default configuration.

    For types without a character set label (BINARY, VARBINARY and %BLOB%), MySQL will accept the data as received and write it to disk. For types with a character set label, MySQL will look at what you announced as your clients character set to the server with SET NAMES and what is the columns defined character set label. It will then convert from the connection character set to the column character set and write the converted data. You can check that with the HEX() function, e.g. SELECT HEX(str) FROM t WHERE id = ....

    On retrieval, the connections announced character set with SET NAMES can be different from what it had been at write time. MySQL again will check the columns character set label against the character set announced for this connection, and if necessary, will convert to the connection character set.

    The performance penalty for that conversion either way is negligible compared to the time taken for the disk I/O incurred for such data anyway, to performancewise it hardly matters which type you choose. The rule is instead: Chose a type with a character set label, if you are working with textual data, and a type without if you are not.


    A related question often asked: Shall I choose CHAR or VARCHAR (BINARY or VARBINARY respectively)?

    For InnoDB, the answer is always: Chose the variable length data type. There is no performance benefit from fixed length data types in InnoDB, ever, but there is a huge size penalty if you are choosing a fixed length data type and then are not using all space in it. Plus fixed length SQL string types have really weird rules regarding padding and trimming with spaces at the end which you probably can't be bothered to learn. For MySQL the case can be different, but almost never is.


    Another related question: Shall I choose VARCHAR or TEXT for my strings (VARBINARY or BLOB, respectively)?

    The answer for that is using a recent version of InnoDB, Barracuda format tables and then TEXT/BLOB. The reason for that is explained at length in http://www.mysqlperformanceblog.com/2011/04/07/innodb-row-size-limitation/. The upshot of that is: With either VARCHAR or TEXT/BLOB in pre-Barracuda format you run the risk of overflowing the InnoDB row size limitation if you have too many of them in a single row.


    And finally: Shall I store files/images/other large blob or text data in the database?

    The answer for that is: Usually not. Serving files from a database (http://mysqldump.azundris.com/archives/36-Serving-Images-From-A-Database.html) is an expensive operation compared to serving files from a filesystem. If at all possible, you'd want to do that instead. There is a way around that, http://www.blobstreaming.org/, but that is advanced technology that requires that you have total control of your execution environment, which is never the case in a hosted environment.


    To round this off: There are no variable length data types in MEMORY engine tables. So if you are seeing "using temporary" in an EXPLAIN output, this means

    • VARCHAR is converted to CHAR in that temporary table
    • VARBINARY is converted to BINARY

    If the temporary table by this process becomes larger than tmp_table_size OR max_heap_table_size, it is on the fly converted to MyISAM format and goes to disk.

    Example: You are defining a Ruby Active Record class User with contains ten fields tagged as :string. Each of these end up being VARCHAR(255) CHARSET utf8 in your Users table.

    Elsewhere in your codebase, Users is being used in a way that involves a plan using temporary. You are instantly dying in disk operations under load, because each row of the Users table is now using 7650 bytes in MEMORY at least, most of them being spaces used as padding. This forces the temporary table to be converted to MyISAM and written to disk.

    • any %TEXT% or %BLOB% type cannot be represented in MEMORY, so the temporary table goes to disk as MyISAM even if it would have been sufficiently small to be kept in memory according to the limits stated above.

    This implies that any query with a TEXT or BLOB type and a plan with "using temporary" needs to be rewritten in order to avoid temp tables hitting the disk.

    0 讨论(0)
  • 2021-01-12 10:31

    For MySQL there is a procedure called analyze that will assess data heuristics with the idea that it informs the best choice for a data type and will suggest a range or values for enumeration.

    A quick dynamic concat script to generate the SQL to run

    select CONCAT(' SELECT ', COLUMN_NAME, ' FROM ', TABLE_NAME, ' procedure analyse() ;' ) 
    FROM INFORMATION_SCHEMA.COLUMNS   
    WHERE table_schema ="yourDbName"   
    AND DATA_TYPE ="varchar"   
    AND CHARACTER_MAXIMUM_LENGTH > 190   
    AND COLUMN_KEY not in (' ') ;
    

    ** SQL above does not evaluate PKs -- assuming they are not text fields

    The procedure is useful when looking to change a datatype based on data usage or to gain more efficiencies by moving or storing a smaller data packet.

    Percona Blog has a good working example of procedure analyze that applies to Drupal. https://www.percona.com/blog/2009/03/23/procedure-analyse/

    Some of that research is done for compression which ties into longer utf8mb4 indexes http://techblog.constantcontact.com/devops/space-the-final-frontier-a-story-of-mysql-compression/

    0 讨论(0)
提交回复
热议问题