Increase PostgreSQL write speed at the cost of likely data loss?

前端 未结 8 1041
执笔经年
执笔经年 2021-01-29 19:08

I love that PostgreSQL is crash resistant, as I don\'t want to spend time fixing a database. However, I\'m sure there must be some things I can disable/modify so that i

8条回答
  •  孤街浪徒
    2021-01-29 19:23

    1M commits in 22 minutes seems reasonable, even with synchronous_commit = off, but if you can avoid the need to commit on each insert then you can get a lot faster than that. I just tried inserting 1M (identical) rows into your example table from 10 concurrent writers, using the bulk-insert COPY command:

    $ head -n3 users.txt | cat -A # the rest of the file is just this another 99997 times
    Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
    Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
    Random J. User^Irjuser@email.com^Ihttp://example.org^I100$
    $ wc -l users.txt
    100000 users.txt
    $ time (seq 10 | xargs --max-procs=10 -n 1 bash -c "cat users.txt | psql insertspeed -c 'COPY \"user\" (username, email, website, created) FROM STDIN WITH (FORMAT text);'")
    
    real    0m10.589s
    user    0m0.281s
    sys     0m0.285s
    $ psql insertspeed -Antc 'SELECT count(*) FROM "user"'
    1000000
    

    Clearly there's only 10 commits there, which isn't exactly what you're looking for, but that hopefully gives you some kind of indication of the speed that might be possible by batching your inserts together. This is on a VirtualBox VM running Linux on a fairly bog-standard Windows desktop host, so not exactly the highest-performance hardware possible.

    To give some less toy figures, we have a service running in production which has a single thread that streams data to Postgres via a COPY command similar to the above. It ends a batch and commits after a certain number of rows or if the transaction reaches a certain age (whichever comes first). It can sustain 11,000 inserts per second with a max latency of ~300ms by doing ~4 commits per second. If we tightened up the maximum permitted age of the transactions we'd get more commits per second which would reduce the latency but also the throughput. Again, this is not on terribly impressive hardware.

    Based on that experience, I'd strongly recommend trying to use COPY rather than INSERT, and trying to reduce the number of commits as far as possible while still achieving your latency target.

提交回复
热议问题