Chosing the right index for PostgreSQL query

痴心易碎 提交于 2019-12-07 12:30:21

问题


Simplified table:

CREATE TABLE products (
product_no integer PRIMARY KEY,
sales integer,
status varchar(16),
category varchar(16));

CREATE INDEX index_products_sales ON products (sales);
CREATE INDEX index_products_status ON products (status);
CREATE INDEX index_products_category ON products (category);

PostgreSQL version is 8.4. Columns 'status' and 'category'

There are 20 million products/rows spread across 15 categories.

One of the most used queries is getting the three most sold products, excluding products in categories 'cat3' and 'cat7':

SELECT product_no, sales 
FROM products 
WHERE status = 'something' AND category NOT IN ('cat3', 'cat7') 
ORDER BY sales DESC 
LIMIT 3;

Limit  (cost=0.00..8833.39 rows=3 width=12) (actual time=9235.332..9356.284 rows=3 loops=1)
   ->  Index Scan using index_products_sales on products  (cost=0.00..68935806.85 rows=23412 width=12) (actual time=9235.327..9356.278 rows=3 loops=1)
     Filter: (((category)::text <> ALL ('{cat3,cat7}'::text[])) AND ((status)::text = 'something'::text))

What would be the best index for making this specific query run faster?


回答1:


Create a partial, multicolumn index with this particular sort order:

CREATE INDEX products_status_sales_partial_idx ON products (status, sales DESC)
WHERE  category NOT IN ('cat3','cat7');

Modify your query slightly:

SELECT product_no, sales 
FROM   products 
WHERE  status = 'something'
AND    category NOT IN ('cat3', 'cat7') 
ORDER  BY status, sales DESC 
LIMIT  3;

Adding status as first element of the ORDER BY clause seems redundant and pointless. But give it a try.

Why?

The query planner is not smart enough to understand, that with

WHERE  status = 'something' ...
ORDER  BY sales DESC

the sort order of the index (status, sales DESC) matches as a logical consequence. So it is going to read all qualifying rows, sort and pick the top 3.

By adding status to the ORDER BY you enable the query planner to read the top 3 entries from the index directly. Expect a speed-up by several orders of magnitude.

Tested with PostgreSQL 8.4 and 9.1.




回答2:


I think a b-tree index is still your best bet. I could be wrong, though. I think I would test two things.

First, a partial index on category that excludes 'cat3' and 'cat7'.

CREATE INDEX index_products_category ON products (category)
  WHERE category NOT IN ('cat3','cat7');

Second, a descending sort on sales.

CREATE INDEX index_products_sales ON products (sales DESC);

Either one of these might slow down other queries, though, so you might need one or both of these in addition to the existing indexes.



来源:https://stackoverflow.com/questions/11137752/chosing-the-right-index-for-postgresql-query

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!