This Question addresses only how 'short' CHAR
and VARCHAR
columns are stored in an InnoDB table.
- Does a
CHAR(10)
column occupy exactly 10 bytes? - What happens with trailing blanks?
- What about character sets that need more than 1 byte per character?
- How does
VARCHAR(10)
differ fromCHAR(10)
? EXPLAIN
implies that all indexed varchars contain a 2-byte length field. Is it really 2 bytes? Or might it be 1 byte? (cfkey_len
).- What about different
ROW_FORMATs
?
Not covered in this Question (to keep it from being too broad):
- What about
TEXT
. - What about 255, 191, off-page storage, etc.
- What happens in an index starting with a char/varchar. (Think: removal of common prefix.)
- What happens with char/varchar when involved in a
MEMORY
temp table. Also, what changes happen in version 8.0. ROW_FORMAT
has a significant impact on longer string columns, primarily in deciding when off-page storage is used.
From MySQL Documentation:
The difference between CHAR
and VARCHAR
values is the way they are stored, CHAR (10)
requires 10 bytes of storage no matter how many characters you use because the data is right-padded with spaces, VARCHAR (10)
only takes 1 byte (in 1 byte character set) + length prefix (1 when the length is 255 or less, 2 otherwise... I don't know why key_len for EXPLAIN
add 2 bytes)
I don't understand what you mean with trailing blanks, although I can imagine you are referring to the excess of trailing spaces, with VARCHAR
these are truncated with a warning, meanwhile in CHAR
columns these spaces are truncated silently, this has some sense cause CHAR
are stored with trailing blanks at the end.
Regarding character set in this link you can see that the number of characters for the CHAR
or VARCHAR
is the same, although, your storage will require from 1 to 4 bytes per character, here is the list of supported character set and here the bytes per character.
What I've read of different rows format for InnoDB
Redundant Row Format Characteristics:
Internally, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format. InnoDB does not truncate trailing spaces from VARCHAR columns.
InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4.
COMPACT Row Format Characteristics:
Internally, for nonvariable-length character sets, InnoDB stores fixed-length character columns such as CHAR(10) in a fixed-length format.
InnoDB does not truncate trailing spaces from VARCHAR columns.
Internally, for variable-length character sets such as utf8mb3 and utf8mb4, InnoDB attempts to store CHAR(N) in N bytes by trimming trailing spaces. If the byte length of a CHAR(N) column value exceeds N bytes, InnoDB trims trailing spaces to a minimum of the column value byte length. The maximum length of a CHAR(N) column is the maximum character byte length × N.
InnoDB reserves a minimum of N bytes for CHAR(N). Reserving the minimum space N in many cases enables column updates to be done in place without causing fragmentation of the index page. By comparison, for ROW_FORMAT=REDUNDANT, CHAR(N) columns occupy the maximum character byte length × N.
InnoDB encodes fixed-length fields greater than or equal to 768 bytes in length as variable-length fields, which can be stored off-page. For example, a CHAR(255) column can exceed 768 bytes if the maximum byte length of the character set is greater than 3, as it is with utf8mb4.
ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED handle CHAR storage in the same way as ROW_FORMAT=COMPACT.
...
DYNAMIC and COMPRESSED row formats are variations of the COMPACT row format and therefore handle CHAR storage in the same way as the COMPACT row format
来源:https://stackoverflow.com/questions/48426699/how-does-innodb-store-character-columns