We are running Postgres 9.1.3 and we have recently started to run into major performance problems on one of our servers.
Our queries ran fine for a while, but as of Augu
It's difficult to be sure, but I think you are right to be suspicious of I/O issues. What can happen is that as tables get larger or connections are increased then cache hits start to fall. That increases I/o demands and slows everything down. Meanwhile, more queries arrive, making the problem worse. The situation is complicated for you because virtual disks don't necessarily behave the same as physical ones.
Firstly you will need to measure actual activity on the VM (through vmstat or iostat perhaps). Secondly, do the same on the real hardware. Finally, run some standard disk bandwidth tools on both (in particular random read/write mixes). Now you'll be able to say how much of your available I/o is being used.
As for query plans, without the schema details and explain analyse output no-one can say.
You will find the postgresql.org mailing list useful even if just for the archives. Also, the book linked below is excellent.
http://www.packtpub.com/postgresql-90-high-performance/book
I would turn on auto vacuum as well. There are a few variables you can set that control how much the vacuum will interfere. With the amount of RAM you have you should have your shared buffers set between 2048MB - 3276MB. If you have a lot of extra RAM that your system doesn't seem to be using that you don't need elsewhere you should probably set it closer to the higher end. Also you may want to look at your max segment size with sysctl. Your maintenance_work_mem is really high, but if you are doing mostly maintenance then I suppose it isn't as bad as I first thought.
Your biggest problem is this line:
autovacuum | off
Turning it on won't immediately cure the problem, but it should keep things from eroding further. There are almost no cases where it is a good idea to turn this off. The main exception is a big bulk load followed by an explicit VACUUM FREEZE ANALYZE, after which autovacuum should be turned back on. With autovacuum off, you will see performance degrade, just as you have. Once the database has gotten into such bad shape, it requires more aggressive maintenance than autovacuum can provide to recover.
checkpoint_segments | 6
Increasing this will help data modifications, but won't do much to improve the speed of SELECT
statements.
fsync | off full_page_writes | off
These settings tell PostgreSQL to speed up writes at the expense of persistence. If your hardware or OS (or VM) crashes or is abruptly killed, your database will be corrupted and your best bet will be to restore from your last known good backup. (Of course, since hardware can fail at any time, if you care about losing the data, you have a good backup strategy in place.)
maintenance_work_mem | 1GB
This is too high for an 8GB VM. You can always boost it on a single connection before running some heavy maintenance on that connection.
wal_writer_delay | 10ms
Even seasoned experts have trouble adjusting this to something that gets better performance than the default. It is almost always best left alone.
Your best bet at this point is to use pg_dumpall to dump your database cluster to some other medium, start with a fresh initdb, and restore. As a database superuser, run VACUUM FREEZE ANALYZE
(the FREEZE
is not generally recommended except after a bulk load like that), and run with autovacuum turned on.
I highly recommend that you get a copy of Greg Smith's "PostgreSQL 9.0 High Performance" book, and read it carefully. (Full disclosure, I was one of the technical reviewers for the book, but get no money from sales.) One of the first things he recommends is getting benchmark numbers on the speed of your RAM and disk before you even install PostgreSQL -- that way you know what you're dealing with.
(queries with count(*) are especially bad),
You should look into window functions
Otherwise, we have no idea without seeing your relevant schema and your queries.