Should MEDIUMINT be avoided in MySQL?

前端 未结 2 1470
半阙折子戏
半阙折子戏 2021-02-13 11:11

I came across a comment on the following blogpost that recommends against using MEDIUMINT:

Don’t use [the 24bit INT], even in MySQL. It’s dumb,

2条回答
  •  北荒
    北荒 (楼主)
    2021-02-13 11:55

    In the grand scheme of things, fetching a row is the big cost. Simple functions, expressions, and much less, data formats, is insignificant in how long a query takes.

    On the other side, if your dataset it too large to stay cached, the overhead of I/O to fetch row(s) is even more significant. A crude rule of thumb says that a non-cached row takes 10 times as long as a cached one. Hence, shrinking the dataset (such as using a smaller *INT) may give you a huge performance benefit.

    This argument apples to ...INT, FLOAT vs DOUBLE, DECIMAL(m,n), DATETIME(n), etc. (A different discussion is needed for [VAR]CHAR/BINARY(...) and TEXT/BLOB.)

    For those with a background in Assembly language...

    • A table is likely to have a mixture of numbers and strings, thereby thwarting attempts to "align" values.
    • MySQL has always handled a variety of hardwares (big/little-endian, 16/32/64-bit) with binary compatibility. Note how the code @akuzminsky provided avoids alignment and endian issues. And it lets the compiler deal with 32-bit issues if the hardware is only 16-bit.
    • The code to test for special cases would probably outweigh the simply writing generic code.
    • We are talking typically less than 1% of the total row-handling time.

    Hence, the only sane way to write the code is to work at the byte level, and to ignore register size and assume all values are mis-aligned.

    For Optimization, in order of importance:

    1. Count the disk hits. Touching disk is overwhelmingly the most costly part of a query.
    2. Count the number of rows touched. Finding a row (via BTree, etc) takes some CPU. But, note, very few installations are CPU-bound; those that are tend to have poor indexes. (Rule of Thumb: There are typically 100 rows in an InnoDB data or index block.)
    3. Only now does parsing the row come into play.

    Rule of Thumb: If a tentative optimization does not (via back-of-envelope calc) yield 10% improvement, don't waste your time on it. Instead look for some bigger improvement. For example, indexes and Summary tables are often provide 10x (not just 10%).

提交回复
热议问题