Should MEDIUMINT be avoided in MySQL?

前端未结

关注

 2  1470

半阙折子戏 2021-02-13 11:11

I came across a comment on the following blogpost that recommends against using MEDIUMINT:

Don’t use [the 24bit INT], even in MySQL. It’s dumb,

2条回答

北荒 (楼主)

2021-02-13 11:55
In the grand scheme of things, fetching a row is the big cost. Simple functions, expressions, and much less, data formats, is insignificant in how long a query takes.

On the other side, if your dataset it too large to stay cached, the overhead of I/O to fetch row(s) is even more significant. A crude rule of thumb says that a non-cached row takes 10 times as long as a cached one. Hence, shrinking the dataset (such as using a smaller *INT) may give you a huge performance benefit.

This argument apples to ...INT, FLOAT vs DOUBLE, DECIMAL(m,n), DATETIME(n), etc. (A different discussion is needed for [VAR]CHAR/BINARY(...) and TEXT/BLOB.)

For those with a background in Assembly language...
- A table is likely to have a mixture of numbers and strings, thereby thwarting attempts to "align" values.
- MySQL has always handled a variety of hardwares (big/little-endian, 16/32/64-bit) with binary compatibility. Note how the code @akuzminsky provided avoids alignment and endian issues. And it lets the compiler deal with 32-bit issues if the hardware is only 16-bit.
- The code to test for special cases would probably outweigh the simply writing generic code.
- We are talking typically less than 1% of the total row-handling time.
Hence, the only sane way to write the code is to work at the byte level, and to ignore register size and assume all values are mis-aligned.

For Optimization, in order of importance:
1. Count the disk hits. Touching disk is overwhelmingly the most costly part of a query.
2. Count the number of rows touched. Finding a row (via BTree, etc) takes some CPU. But, note, very few installations are CPU-bound; those that are tend to have poor indexes. (Rule of Thumb: There are typically 100 rows in an InnoDB data or index block.)
3. Only now does parsing the row come into play.
Rule of Thumb: If a tentative optimization does not (via back-of-envelope calc) yield 10% improvement, don't waste your time on it. Instead look for some bigger improvement. For example, indexes and Summary tables are often provide 10x (not just 10%).
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...