I love that PostgreSQL is crash resistant, as I don\'t want to spend time fixing a database. However, I\'m sure there must be some things I can disable/modify so that i
1M commits in 22 minutes seems reasonable, even with synchronous_commit = off
, but if you can avoid the need to commit on each insert then you can get a lot faster than that. I just tried inserting 1M (identical) rows into your example table from 10 concurrent writers, using the bulk-insert COPY
command:
$ head -n3 users.txt | cat -A # the rest of the file is just this another 99997 times
Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
$ wc -l users.txt
100000 users.txt
$ time (seq 10 | xargs --max-procs=10 -n 1 bash -c "cat users.txt | psql insertspeed -c 'COPY \"user\" (username, email, website, created) FROM STDIN WITH (FORMAT text);'")
real 0m10.589s
user 0m0.281s
sys 0m0.285s
$ psql insertspeed -Antc 'SELECT count(*) FROM "user"'
1000000
Clearly there's only 10 commits there, which isn't exactly what you're looking for, but that hopefully gives you some kind of indication of the speed that might be possible by batching your inserts together. This is on a VirtualBox VM running Linux on a fairly bog-standard Windows desktop host, so not exactly the highest-performance hardware possible.
To give some less toy figures, we have a service running in production which has a single thread that streams data to Postgres via a COPY
command similar to the above. It ends a batch and commits after a certain number of rows or if the transaction reaches a certain age (whichever comes first). It can sustain 11,000 inserts per second with a max latency of ~300ms by doing ~4 commits per second. If we tightened up the maximum permitted age of the transactions we'd get more commits per second which would reduce the latency but also the throughput. Again, this is not on terribly impressive hardware.
Based on that experience, I'd strongly recommend trying to use COPY
rather than INSERT
, and trying to reduce the number of commits as far as possible while still achieving your latency target.
1M records inserted in 22 minutes works out to be 758 records/second. Each INSERT here is an individual commit to disk, with both write-ahead log and database components to it eventually. Normally I expect that even good hardware with a battery-backed cache and everything you will be lucky to hit 3000 commit/second. So you're not actually doing too bad if this is regular hardware without such write acceleration. The normal limit here is in the 500 to 1000 commits/second range in the situation you're in, without special tuning for this situation.
As for what that would look like, if you can't make the commits include more records each, your options for speeding this up include:
Turn off synchronous_commit (already done)
Increase wal_writer_delay. When synchronous_commit is off, the database spools commits up to be written every 200ms. You can make that some number of seconds instead if you want to by tweaking this upwards, it just increases the size of data loss after a crash.
Increase wal_buffers to 16MB, just to make that whole operation more efficient.
Increase checkpoint_segments, to cut down on how often the regular data is written to disk. You probably want at least 64 here. Downsides are higher disk space use and longer recovery time after a crash.
Increase shared_buffers. The default here is tiny, typically 32MB. You have to increase how much UNIX shared memory the system has to allocate. Once that's done, useful values are typically >1/4 of total RAM, up to 8GB. The rate of gain here falls off above 256MB, the increase from the default to there can be really helpful though.
That's pretty much it. Anything else you touched that might help could potentially cause data corruption in a crash; these are all completely safe.
Well, you don't give us much to go on. But it sounds like you're looking for asynchronous commits.
Don't overlook a hardware upgrade--faster hardware usually means a faster database.
Well one thing you could do to speed things up is drop the index you are creating manually - the primary key
constraint already auto-creates a unique index on that column as you can see below (I'm testing on 8.3):
postgres=> CREATE TABLE "user"
postgres-> (
postgres(> id serial NOT NULL,
postgres(> username character varying(40),
postgres(> email character varying(70),
postgres(> website character varying(100),
postgres(> created integer,
postgres(> CONSTRAINT user_pkey PRIMARY KEY (id)
postgres(> )
postgres-> WITH ( OIDS=FALSE );
NOTICE: CREATE TABLE will create implicit sequence "user_id_seq" for serial column "user.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "user_pkey" for table "user"
CREATE TABLE
postgres=> CREATE INDEX id ON "user" USING btree (id);
CREATE INDEX
postgres=> \d user
Table "stack.user"
Column | Type | Modifiers
----------+------------------------+---------------------------------------------------
id | integer | not null default nextval('user_id_seq'::regclass)
username | character varying(40) |
email | character varying(70) |
website | character varying(100) |
created | integer |
Indexes:
"user_pkey" PRIMARY KEY, btree (id)
"id" btree (id)
Also, consider changing wal_sync_method
to an option that uses O_DIRECT
- this is not the default on Linux
I think the problem can't be solved by dealing with the server only.
I found PostgreSQL can commit 3000+ rows per second, and both server and client were not busy, but the time went by. In contrast SQL Server can reach 5000+ rows per second, and Oracle is even faster, it can reach 12000+ per second, about 20 fields in a row.
I guess the roundtrip is the problem: Send a row to server, and receive the reply from the server. Both SQL Server and Oracle support batch operations: send more than one row in a function call and wait for the reply.
Many years ago I worked with Oracle: Trying to improve the write performance using OCI, I read documents and found too many round trips will decrease performance. Finally I solved it by using batch operations: send 128 or more rows to the server in a batch and wait for the reply. It reached 12000+ rows per second. If you do not use batches and send all rows individually (including wait), it reached only about 2000 rows per second.
One possibility would be to use the keywork DEFERRABLE to defer constraints because constraints are checked for every lines.
So the idea would be to ask postgresql to check constraints just before you commit.