问题
I have a User table with 1m records:
User (id, fname, lname, deleted_at, guest)
I have the following query which is being run against a postgres 9.1 db:
SELECT "users".*
FROM "users"
WHERE (users.deleted_at IS NULL) AND (SUBSTRING(lower(fname), 1, 1) = 's')
ORDER BY guest = false, fname ASC
LIMIT 25 OFFSET 0
Using pgAdmin 3, this SQL is taking 7120ms to return 25 rows. If I remove the 'ORDER BY guest = false, fname ASC' the query takes just 31ms.
I have the following indexes:
add_index "users", ["fname"], :name => "index_users_on_fname"
add_index "users", ["guest", "fname"], :name => "index_users_on_guest_and_fname"
add_index "users", ["deleted_at"], :name => "index_users_on_deleted_at"
add_index "users", ["guest"], :name => "index_users_on_guest"
Any ideas? Thank you!
UPDATED with Explain
"Limit (cost=43541.55..43541.62 rows=25 width=1612) (actual time=1276.777..1276.783 rows=25 loops=1)"
" -> Sort (cost=43541.55..43558.82 rows=6905 width=1612) (actual time=1276.775..1276.777 rows=25 loops=1)"
" Sort Key: ((NOT guest)), fname"
" Sort Method: top-N heapsort Memory: 37kB"
" -> Seq Scan on users (cost=0.00..43346.70 rows=6905 width=1612) (actual time=5.143..1272.563 rows=475 loops=1)"
" Filter: ((deleted_at IS NULL) AND pubic_profile_visible AND ((fname)::text ~~ 's%'::text))"
"Total runtime: 1276.967 ms"
回答1:
First, since PostgreSQL 9.1 you can use left() to simplify the expression:
substring(lower(fname), 1, 1)
lower(left(fname, 1)) -- equivalent, but simpler and faster
Also slightly faster to take the first character before casting to lower case.
Next, clean up the query:
SELECT *
FROM users
WHERE deleted_at IS NULL
AND lower(left(fname, 1)) = 's'
ORDER BY guest DESC NULLS LAST, fname
LIMIT 25 OFFSET 0;
guest DESC NULLS LAST results in the same as guest = FALSE
, just without calculating a new value for every row.
Next, create this multi-column partial index:
CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)), guest DESC NULLS LAST, fname)
WHERE deleted_at IS NULL;
Run
ANALYZE users;
Or, even better, CLUSTER (if you don't have more important queries requiring a different order) - and then ANALYZE
:
CLUSTER users using users_multi_idx;
And it will be way faster than anything you tried before. Because now, the query reads rows from the index sequentially and the table has been physically rewritten in the same order, resulting in only few page hits ...
回答2:
Seems to me you could stand to have some better indexing here; You are filtering based on the deleted_at
field, and then sorting on the guest
field, but those fields are not in a common index. Ignoring your other WHERE
clause for the moment, you seem to be causing the engine to dig through all the records, or just individually check each record for it's guest
value; I don't see how your index with guest
in it could be helping.
If you included the guest
field in an index along with the deleted_at
field (the latter being first), you might get some benefit there.
回答3:
At first glance your problem is the need to fully evaluate the where clause, in order to get all (not just the 25 first rows) you need to order by after... try adding a column containing substring(lower(fname), 1, 1))
let's name it s
for now and adding an index on deleted_at, s
, or if this is these are the only values you'll be formulating this where with an index on (deleted is null), (s = 's')
.
You could use a trigger to keep the s
column up to date.
To make it temporarily faster you could just rewrite substring(lower(fname), 1, 1))
to lower(substring(fname, 1, 1))
or if postgresql has this syntax lower(fname[1]))
回答4:
If there are few distinct values in a column then an index on that column is not of much value. That is the case with a boolean column.
I would test creating a partial index on SUBSTRING(lower(fname), 1, 1)
CREATE INDEX users_substr_null_ix ON users (SUBSTRING(lower(fname), 1, 1))
WHERE users.deleted_at IS NULL;
And also test a partial index on fname:
CREATE INDEX users_fname_not_guest_ix ON users (fname)
WHERE not guest;
Or even better
CREATE INDEX users_substr_null__not_guest_ix ON users (SUBSTRING(lower(fname), 1, 1), fname)
WHERE users.deleted_at IS NULL and not guest;
来源:https://stackoverflow.com/questions/12905312/order-by-turns-a-30ms-query-into-a-7120ms-query-known-performance-issue