I have a script that generates tens of thousands of inserts into a postgres db through a custom ORM. As you can imagine, it\'s quite slow. This is used for development purpose
The fastest way to insert data would be the COPY
command.
But that requires a flat file as its input. I guess generating a flat file is not an option.
Don't commit too often, especially do not run this with autocommit enabled. "Tens of thousands" sounds like a single commit at the end would be just right.
If you can convice your ORM to make use of Postgres' multi-row insert that would speed up things as well
This is an example of a multi-row insert:
insert into my_table (col1, col2) values (row_1_col_value1, row_1_col_value_2), (row_2_col_value1, row_2_col_value_2), (row_3_col_value1, row_3_col_value_2)
If you can't generate the above syntax and you are using Java make sure you are using batched statements instead of single statement inserts (maybe other DB layers allow something similar)
Edit:
jmz' post inspired me to add something:
You might also see an improvement when you increase wal_buffers
to some bigger value (e.g. 8MB) and checkpoint_segments
(e.g. 16)
One thing you can do is remove all indexs, do your inserts, and then recreate the indexes.
Are you sending a batch of tens of thousands of INSERTs OR are you sending tens of thousands of INSERTs?
I know with Hibernate you can batch all your SQL statements up and send them at the end in one big chunk instead of taking the tax of network and database overhead of making thousands of SQL statements individually.
If you don't need that kind of functionality in production environment, I'd suggest you turn fsync off from your PostgreSQL config. This will speed up the inserts dramatically.
Never turn off fsync on a production database.
For inserts that number in the hundreds to thousands, batch them:
begin;
insert1 ...
insert2 ...
...
insert10k ...
commit;
For inserts in the millions use copy:
COPY test (ts) FROM stdin;
2010-11-29 22:32:01.383741-07
2010-11-29 22:32:01.737722-07
... 1Million rows
\.
Make sure any col used as an fk in another table is indexed if it's more than trivial in size in the other table.
If you are just initializing constant test data, you could also put the test data into a staging table(s), then just copy the table contents, using
INSERT INTO... SELECT...
that should be about as fast as using COPY (though I did not benchmark it), with the advantage that you can copy using just SQL commands, without the hassle of setting up an external file like for COPY.