Does MySQL create an extra index for primary key or uses the data itself as an “index”

问题

Cant find a explicit answer of that. I know that when you create a primary key, MySQL orders the data according to that primary key, question is, does it actually create another index, or uses the actual data as an index since it should be ordered by the primary key?

EDIT:

if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage

回答1:

Clustered and Secondary Indexes

Every InnoDB table has a special index called the clustered index where the data for the rows is stored. Typically, the clustered index is synonymous with the primary key. To get the best performance from queries, inserts, and other database operations, you must understand how InnoDB uses the clustered index to optimize the most common lookup and DML operations for each table.

When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index

If you do not define a PRIMARY KEY for your table, MySQL locates the first UNIQUE index where all the key columns are NOT NULL and InnoDB uses it as the clustered index.

If the table has no PRIMARY KEY or suitable UNIQUE index, InnoDB internally generates a hidden clustered index named GEN_CLUST_INDEX on a synthetic column containing row ID values. The rows are ordered by the ID that InnoDB assigns to the rows in such a table. The row ID is a 6-byte field that increases monotonically as new rows are inserted. Thus, the rows ordered by the row ID are physically in insertion order.

How the Clustered Index Speeds Up Queries

Accessing a row through the clustered index is fast because the index search leads directly to the page with all the row data. If a table is large, the clustered index architecture often saves a disk I/O operation when compared to storage organizations that store row data using a different page from the index record.

回答2:

if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage

Yes, the index for the clustered index is the table itself. That's the only place other non-indexed columns are stored. When you SHOW TABLE STATUS you see this reported as Data_length. Secondary indexes are reported as Index_length.

mysql> show table status like 'redacted'\G
*************************** 1. row ***************************
           Name: redacted
         Engine: InnoDB
        Version: 10
     Row_format: Dynamic
           Rows: 100217
 Avg_row_length: 1168
    Data_length: 117063680    <-- clustered index
Max_data_length: 0
   Index_length: 3653632      <-- secondary index(es)

InnoDB always stores a clustered index. If you have no PRIMARY KEY defined on any columns of your table, InnoDB creates an artificial column as the key for the clustered index, and this column cannot be queried.

回答3:

if I have a table with has index A and index B and no primary key, I have the data + index A + index B. If I change the table to have columns of index A as the primary key, I will only have data (which is also used as an index) + index B right? The above is in terms of memory usage

While that is true - There is more to consider in terms of storage size.

Assuming, what you try to do is logically fine and your index, which you want to promote to primary key is actually a candidate key. If you can save on storage size depends on the number of indices and the size of the primary key columns. The reason is that InnoDB appends the primary key columns to every secondary index (if they are not already explicitely part of it). It can also affect other (bigger) tables, if they need to reference it as foreign key.

Here are some simple tests, which can show the differences. I am using MariaDB since it's sequence plugin makes it easy to create dummy data. But you should see the same effects on MySQL server.

So first I will just create a simple table with two INT columns and an index on each filling it with 100K rows.

drop table if exists test;
create table test(
    a int,
    b int,
    index(a),
    index(b)
);

insert into test(a, b)
    select seq as a, seq as b
    from seq_1_to_100000
;

To keep it simple, I will just look at the file size of the table (I'm using innodb_file_per_table=1).

16777216 test.ibd

Now let's do what you wanted, and make column a primary key, changing the CREATE statement:

create table test(
    a int,
    b int,
    primary key(a),
    index(b)
);

The file size now is:

13631488 test.ibd

So it's true - You can save on storage size by promoting an index to primary key. In this case almost 20%.

But what happens if I change the column type from INT (4 bytes) to BINARY(32) (32 byte)?

create table test(
    a binary(32),
    b binary(32),
    index(a),
    index(b)
);

File size:

37748736 test.ibd

Now make column a primary key

create table test(
    a binary(32),
    b binary(32),
    primary key(a),
    index(b)
);

File size:

41943040 test.ibd

As you can see, you can as well increase the size. In this case like 11%.

It is though advised to always define a primary key. If in doubt, just create an AUTO_INCREMENT PRIMARY KEY. In my example it could be:

create table test(
    id mediumint auto_increment primary key,
    a binary(32),
    b binary(32),
    index(a),
    index(b)
);

File size:

37748736 test.ibd

The size is the same as if we didn't have an explicit primary key. (Though I would expect to save a bit on size, since I use 3 byte PK instead of a hidden 6 byte PK.) But now you can use it in your queries, for foreign keys and joins.

来源：https://stackoverflow.com/questions/57134422/does-mysql-create-an-extra-index-for-primary-key-or-uses-the-data-itself-as-an

标签

mysql

primary-key