问题
Simplified table:
CREATE TABLE products (
product_no integer PRIMARY KEY,
sales integer,
status varchar(16),
category varchar(16));
CREATE INDEX index_products_sales ON products (sales);
CREATE INDEX index_products_status ON products (status);
CREATE INDEX index_products_category ON products (category);
PostgreSQL version is 8.4. Columns 'status' and 'category'
There are 20 million products/rows spread across 15 categories.
One of the most used queries is getting the three most sold products, excluding products in categories 'cat3' and 'cat7':
SELECT product_no, sales
FROM products
WHERE status = 'something' AND category NOT IN ('cat3', 'cat7')
ORDER BY sales DESC
LIMIT 3;
Limit (cost=0.00..8833.39 rows=3 width=12) (actual time=9235.332..9356.284 rows=3 loops=1)
-> Index Scan using index_products_sales on products (cost=0.00..68935806.85 rows=23412 width=12) (actual time=9235.327..9356.278 rows=3 loops=1)
Filter: (((category)::text <> ALL ('{cat3,cat7}'::text[])) AND ((status)::text = 'something'::text))
What would be the best index for making this specific query run faster?
回答1:
Create a partial, multicolumn index with this particular sort order:
CREATE INDEX products_status_sales_partial_idx ON products (status, sales DESC)
WHERE category NOT IN ('cat3','cat7');
Modify your query slightly:
SELECT product_no, sales
FROM products
WHERE status = 'something'
AND category NOT IN ('cat3', 'cat7')
ORDER BY status, sales DESC
LIMIT 3;
Adding status
as first element of the ORDER BY
clause seems redundant and pointless. But give it a try.
Why?
The query planner is not smart enough to understand, that with
WHERE status = 'something' ...
ORDER BY sales DESC
the sort order of the index (status, sales DESC)
matches as a logical consequence. So it is going to read all qualifying rows, sort and pick the top 3.
By adding status
to the ORDER BY
you enable the query planner to read the top 3 entries from the index directly. Expect a speed-up by several orders of magnitude.
Tested with PostgreSQL 8.4 and 9.1.
回答2:
I think a b-tree index is still your best bet. I could be wrong, though. I think I would test two things.
First, a partial index on category that excludes 'cat3' and 'cat7'.
CREATE INDEX index_products_category ON products (category)
WHERE category NOT IN ('cat3','cat7');
Second, a descending sort on sales.
CREATE INDEX index_products_sales ON products (sales DESC);
Either one of these might slow down other queries, though, so you might need one or both of these in addition to the existing indexes.
来源:https://stackoverflow.com/questions/11137752/chosing-the-right-index-for-postgresql-query