问题
I'm learning how to optimize my database by re choosing the correct data types for the columns and I want to know how much size I will save if I choose MEDIUMINT
(3 Bytes) instead of INT
(4 Bytes)
AFAIK -and correct me if I'm wrong- I need the database size to be as small as possible to fit in RAM to reduce the hard-desk requests. The size of the database consists of the tables sizes + index sizes
giving that I have an INT
column that has 10'000'000 rows and a B-Tree index on it, how much size In MBs I will save if I changed the datatype of the column from INT
to MEDIUMINT
at
- table data size ?
- index size ?
note: I know MySQL will not reduce the actual size on disk unless I OPTIMIZE TABLE
EDIT: My situation is that I will finish my first serious system in my life shortly -it's an ERP system that I plan to sell in the Arab region market - . Plans 1, 2, 3, 4 databases are supposed to be about 2GB, 4GB, 10GB, 40GB respectively, so If I could reduce the size of each database without sacrificing performance/features, why not ? If I could make a 32GB RAM machine serve 4 clients instead of 2, why not ?
回答1:
Just use INT
unless you have a specific, measurable problem. You're only going to make a mess of things if you fret over every single byte in an era where even the most thrifty of smart phones has a billion of them for memory alone.
I need the database size to be as small as possible to fit in RAM to reduce the hard-desk requests.
No you don't. You need the database to be easy to work with and perform adequately. In an era of SSD-backed databases, I/O will not be a problem until you're operating at large scale, and when and if that day comes then you can take measurements and understand the specific problems you're having.
Shaving a single byte off your INT
field is unlikely to make anything better since three byte integer values are not something your CPU can directly deal with. These will be converted to four bytes and aligned properly so they can be understood, a process that's messy compared to reading a plain old 32-bit integer.
Remember, MySQL comes from an era where a high-end server had 64 megabytes of memory and a 9 gigabyte hard disk was considered huge. Back then you did have to shave bytes off because you only had a handful of them.
Now we have other concerns, like will you accidentally exhaust your 24-bit integer space like Slashdot did where their site went down because of exactly the sort of "optimizing" you're intending to do here.
Be careful. Optimize when you have a concrete reason to, not just because you think you need to. Avoiding premature optimization is a constant struggle in development, but if you're disciplined you can avoid it.
回答2:
The exact size of your index is going to depend on how many rows you have, but also on how the data in your index looks.
If you shave off 1 byte per record in your data, and you have 10.000.000 records, that'll only save you up to 10MB on disk for the table data. Adding an index is going to add some more, and B-trees have empty space in them, but it depends on the actual data how inefficient it is.
If you want to save space, make sure that the field is not nullable, because even if you fill all rows with data, there is information per record, stating whether the nullable field contains data or not.
回答3:
(I disagree with some of the other Answers/Comments. I will try to answer all the questions, plus address all the points that I disagree with.)
MEDIUMINT
is 3 bytes, saving 1 byte per row over INT
.TINYINT
is 1 bytes, saving 3 bytes per row over INT
.
In both cases, there is another 1 or 3 bytes saved per occurrence in any INDEX
other than the PRIMARY KEY
.
If you are likely to have more data+index than space in RAM, then it is wise to shrink the datatypes but be conservative.
Use MEDIUMINT UNSIGNED
(etc) if the value is non-negative, such as for AUTO_INCREMENT
. That gives you a limit of 16M instead of 8M. (Yeah, yeah, that's a tiny improvement.)
Beware of "burning" AUTO_INCREMENT
ids -- INSERT IGNORE
(and several other commands) will allocate the next auto_inc before checking whether it will be used.
Even if data+index exceeds RAM size (actually innodb_buffer_pool_size
), it may not slow down to disk speed -- it depends on access patterns of the data. Beware of UUIDs, they are terribly random. Using UUIDs when you can't cache the entire index is deadly. The buffer_pool is a cache. (I have seen a 1TB dataset run fast enough with only 32GB of RAM and a spinning disk.)
Using ALTER TABLE
to change a datatype probably (I am not sure) rebuilds the table, thereby performing the equivalent of OPTIMIZE TABLE
.
If the table was created with innodb_file_per_table = OFF
and you turn it ON
before doing the ALTER
, you get a separate file for the table, but ibdata1
will not shrink (instead it will have lots more free space).
Alignment of 3-byte numbers -- not an issue. Powers of 2 is not relevant here. MySQL assumes all columns are at poor boundaries, and of poor sizes. All numbers are converted to a generic format (64-bit numbers) for operating on. This conversion is an insignificant part of the total time -- fetching the row (even if cached) is the most costly part.
When I/O-bound, shrinking datatypes leads to more rows per block, which leads to fewer disk hits (except in the UUID case). When I/O-bound, hitting the disk is overwhelming the biggest performance cost.
"NULLS take no space" -- https://dev.mysql.com/doc/internals/en/innodb-field-contents.html . So, again, less I/O. But, beware, if this leads to an extra check for NULL
in a SELECT
, that could lead to a table scan instead of using an index. Hitting 10M rows is a lot worse than hitting just a few.
As for how many clients you can fit into 32GB -- Maybe 6 or more. Remember, the buffer_pool is a cache; data and indexes are cached on a block-by-block basis. (An InnoDB block is 16KB.)
One more thing... It is a lot easier to shrink the datatypes before going into production. So, do what you can safely do now.
来源:https://stackoverflow.com/questions/50053088/how-much-size-i-will-save-if-changed-int-column-to-mediumint