Order BY turns a 30ms query into a 7120ms query. Known performance issue?

China☆狼群 提交于 2019-12-12 16:27:59

问题


I have a User table with 1m records:

User (id, fname, lname, deleted_at, guest)

I have the following query which is being run against a postgres 9.1 db:

SELECT "users".* 
FROM "users" 
WHERE (users.deleted_at IS NULL) AND (SUBSTRING(lower(fname), 1, 1) = 's') 
ORDER BY guest = false, fname ASC 
LIMIT 25 OFFSET 0

Using pgAdmin 3, this SQL is taking 7120ms to return 25 rows. If I remove the 'ORDER BY guest = false, fname ASC' the query takes just 31ms.

I have the following indexes:

add_index "users", ["fname"], :name => "index_users_on_fname"
add_index "users", ["guest", "fname"], :name => "index_users_on_guest_and_fname"
add_index "users", ["deleted_at"], :name => "index_users_on_deleted_at"
add_index "users", ["guest"], :name => "index_users_on_guest"

Any ideas? Thank you!

UPDATED with Explain

"Limit  (cost=43541.55..43541.62 rows=25 width=1612) (actual time=1276.777..1276.783 rows=25 loops=1)"
"  ->  Sort  (cost=43541.55..43558.82 rows=6905 width=1612) (actual time=1276.775..1276.777 rows=25 loops=1)"
"        Sort Key: ((NOT guest)), fname"
"        Sort Method: top-N heapsort  Memory: 37kB"
"        ->  Seq Scan on users  (cost=0.00..43346.70 rows=6905 width=1612) (actual time=5.143..1272.563 rows=475 loops=1)"
"              Filter: ((deleted_at IS NULL) AND pubic_profile_visible AND ((fname)::text ~~ 's%'::text))"
"Total runtime: 1276.967 ms"

回答1:


First, since PostgreSQL 9.1 you can use left() to simplify the expression:

substring(lower(fname), 1, 1)
lower(left(fname, 1)) -- equivalent, but simpler and faster

Also slightly faster to take the first character before casting to lower case.
Next, clean up the query:

SELECT * 
FROM   users 
WHERE  deleted_at IS NULL
AND    lower(left(fname, 1)) = 's'
ORDER  BY guest DESC NULLS LAST, fname
LIMIT  25 OFFSET 0;

guest DESC NULLS LAST results in the same as guest = FALSE, just without calculating a new value for every row.
Next, create this multi-column partial index:

CREATE INDEX users_multi_idx
ON users (lower(left(fname, 1)), guest DESC NULLS LAST, fname)
WHERE deleted_at IS NULL;

Run

ANALYZE users;

Or, even better, CLUSTER (if you don't have more important queries requiring a different order) - and then ANALYZE:

CLUSTER users using users_multi_idx;

And it will be way faster than anything you tried before. Because now, the query reads rows from the index sequentially and the table has been physically rewritten in the same order, resulting in only few page hits ...




回答2:


Seems to me you could stand to have some better indexing here; You are filtering based on the deleted_at field, and then sorting on the guest field, but those fields are not in a common index. Ignoring your other WHERE clause for the moment, you seem to be causing the engine to dig through all the records, or just individually check each record for it's guest value; I don't see how your index with guest in it could be helping.

If you included the guest field in an index along with the deleted_at field (the latter being first), you might get some benefit there.




回答3:


At first glance your problem is the need to fully evaluate the where clause, in order to get all (not just the 25 first rows) you need to order by after... try adding a column containing substring(lower(fname), 1, 1)) let's name it s for now and adding an index on deleted_at, s, or if this is these are the only values you'll be formulating this where with an index on (deleted is null), (s = 's').

You could use a trigger to keep the s column up to date.

To make it temporarily faster you could just rewrite substring(lower(fname), 1, 1)) to lower(substring(fname, 1, 1)) or if postgresql has this syntax lower(fname[1]))




回答4:


If there are few distinct values in a column then an index on that column is not of much value. That is the case with a boolean column.

I would test creating a partial index on SUBSTRING(lower(fname), 1, 1)

CREATE INDEX users_substr_null_ix ON users (SUBSTRING(lower(fname), 1, 1))
WHERE users.deleted_at IS NULL;

And also test a partial index on fname:

CREATE INDEX users_fname_not_guest_ix ON users (fname)
WHERE not guest;

Or even better

CREATE INDEX users_substr_null__not_guest_ix ON users (SUBSTRING(lower(fname), 1, 1), fname)
WHERE users.deleted_at IS NULL and not guest;


来源:https://stackoverflow.com/questions/12905312/order-by-turns-a-30ms-query-into-a-7120ms-query-known-performance-issue

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!