What is the best way to achieve speedy inserts of large amounts of data in MySQL?

问题

I have written a program in C to parse large XML files and then create files with insert statements. Some other process would ingest the files into a MySQL database. This data will serve as a indexing service so that users can find documents easily.

I have chosen InnoDB for the ability of row-level locking. The C program will be generating any where from 500 to 5 million insert statements on a given invocation.

What is the best way to get all this data into the database as quickly as possible? The other thing to note is that the DB is on a separate server. Is it worth moving the files over to that server to speed up inserts?

EDIT: This table won't really be updated, but rows will be deleted.

回答1:

Use the mysqlimport tool or the LOAD DATA INFILE command.
Temporarily disable indices that you don't need for data integrity

回答2:

I'd do at least these things according to this link:

Move the files there and connect over the unix socket
Generate, instead of the INSERTS, a LOAD DATA INFILE file
Disabling indexes during the loading

回答3:

MySQL with the standard table formats is wonderfully fast as long as it's a write-only table; so the first question is whether you are going to be updating or deleting. If not, don't go with innosys - there's no need for locking if you are just appending. You can truncate or rename the output file periodically to deal with table size.

回答4:

1. Make sure you use a transaction.

Transactions eliminate the

INSERT, SYNC-TO-DISK

repetition phase and instead all the disk IO is performed when you COMMIT the transaction.

2. Make sure to utilize connection compression

Raw text + GZip compressed stream ~= as much as 90% bandwidth saving in some cases.

3. Utilise the parallel insert notation where possible

INSERT INTO TableName(Col1,Col2) VALUES (1,1),(1,2),(1,3)

( Less text to send, shorter action )

回答5:

If you can't use LOAD DATA INFILE like others have suggested, use prepared queries for inserts.

回答6:

Really depends on the engine. If you're using InnoDB, do use transactions (you can't avoid them - but if you use autocommit, each batch is implicitly in its own txn), but make sure they're neither too big or too small.

If you're using MyISAM, transactions are meaningless. You may achieve better insert speed by disabling and enabling indexes, but that is only good on an empty table.

If you start with an empty table, that's generally best.

LOAD DATA is a winner either way.

来源：https://stackoverflow.com/questions/314593/what-is-the-best-way-to-achieve-speedy-inserts-of-large-amounts-of-data-in-mysql

标签

mysql

performance

load-data-infile

insert