I love that PostgreSQL is crash resistant, as I don\'t want to spend time fixing a database. However, I\'m sure there must be some things I can disable/modify so that i
1M commits in 22 minutes seems reasonable, even with synchronous_commit = off
, but if you can avoid the need to commit on each insert then you can get a lot faster than that. I just tried inserting 1M (identical) rows into your example table from 10 concurrent writers, using the bulk-insert COPY
command:
$ head -n3 users.txt | cat -A # the rest of the file is just this another 99997 times
Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
$ wc -l users.txt
100000 users.txt
$ time (seq 10 | xargs --max-procs=10 -n 1 bash -c "cat users.txt | psql insertspeed -c 'COPY \"user\" (username, email, website, created) FROM STDIN WITH (FORMAT text);'")
real 0m10.589s
user 0m0.281s
sys 0m0.285s
$ psql insertspeed -Antc 'SELECT count(*) FROM "user"'
1000000
Clearly there's only 10 commits there, which isn't exactly what you're looking for, but that hopefully gives you some kind of indication of the speed that might be possible by batching your inserts together. This is on a VirtualBox VM running Linux on a fairly bog-standard Windows desktop host, so not exactly the highest-performance hardware possible.
To give some less toy figures, we have a service running in production which has a single thread that streams data to Postgres via a COPY
command similar to the above. It ends a batch and commits after a certain number of rows or if the transaction reaches a certain age (whichever comes first). It can sustain 11,000 inserts per second with a max latency of ~300ms by doing ~4 commits per second. If we tightened up the maximum permitted age of the transactions we'd get more commits per second which would reduce the latency but also the throughput. Again, this is not on terribly impressive hardware.
Based on that experience, I'd strongly recommend trying to use COPY
rather than INSERT
, and trying to reduce the number of commits as far as possible while still achieving your latency target.