Can Postgres use a function in a partial index where clause?

谁说胖子不能爱 提交于 2021-01-27 13:29:55

问题


I have a large Postgres table where I want to partial index on 1 of the 2 columns indexed. Can I and how do I use a Postgres function in the where clause of a partial index and then have the select query utilize that partial index?

Example Scenario

First column is "magazine" and the second column is "volume" and the third column is "issue". All the magazines can have same "volume" and "issue" #'s but I want the index to only contain the two most recent volumes for that magazine. This is because a magazine could be older than others and have higher volume numbers than younger magazines.

Two immutable strict functions were created to determine the current and last years volumes for a magazine f_current_volume('gq') and f_previous_volume('gq'). Note: current/past volume # only changes once per year.

I tried creating a partial index with the functions however when using explain on a query it only does a seq scan for a current volume magazine.


CREATE INDEX ix_issue_magazine_volume ON issue USING BTREE ( magazine, volume ) 
  WHERE volume IN (f_current_volume(magazine), f_previous_volume(magazine));

-- Both these do seq scans.
select * from issue where magazine = 'gq' and volume = 100;
select * from issue where magazine = 'gq' and volume = f_current_volume('gq');

What am I doing wrong to get this work? And if it is possible why does it need to be done that way for Postgres to use the index?


-- UPDATE: 2013-06-17, the following surprisingly used the index.
-- Why would using a field name rather than value allow the index to be used?
select * from issue where magazine = 'gq' and volume = f_current_volume(magazine);


回答1:


Immutability and 'current'

If your f_current_volume function ever changes its behaviour - as is implied by its name, and the presence of an f_previous_volume function, then the database is free to return completely bogus results.

PostgreSQL would've refused to let you create the index, complaining that you can only use IMMUTABLE functions. The thing is, marking a function IMMUTABLE means that you are telling PostgreSQL something about the function's behaviour, as per the documentation. You're saying "I promise this function's results won't change, feel free to make assumptions on that basis."

One of the biggest assumptions made is when building an index. If the function returns different outputs for different inputs on multiple invocations, things go splat. Or possibly boom if you're unlucky. In theory you can kind-of get away with changing an immutable function by REINDEXing everything, but the only really safe way is to DROP every index that uses it, DROP the function, re-create the function with its new definition and re-create the indexes.

That can actually be really useful to do if you have something that changes only infrequently, but you really have two different immutable functions at different points in time that just happen to have the same name.

Partial index matching

PostgreSQL's partial index matching is pretty dumb - but, as I found when writing test cases for this, a lot smarter than it used to be. It ignores a dummy OR true. It uses an index on WHERE (a%100=0 OR a%1000=0) for a WHERE a = 100 query. It even got it with a non-inline-able identity function:

regress=> CREATE TABLE partial AS SELECT x AS a, x 
          AS b FROM generate_series(1,10000) x;
regress=> CREATE OR REPLACE FUNCTION identity(integer) 
          RETURNS integer AS $$
          SELECT $1; 
          $$ LANGUAGE sql IMMUTABLE STRICT;
regress=> CREATE INDEX partial_b_fn_idx 
          ON partial(b) WHERE (identity(b) % 1000 = 0);
regress=> EXPLAIN SELECT b FROM partial WHERE b % 1000 = 0;
                                      QUERY PLAN                                       
---------------------------------------------------------------------------------------
 Index Only Scan using partial_b_fn_idx on partial  (cost=0.00..13.05 rows=50 width=4)
(1 row)

However, it was unable to prove the IN clause match, eg:

regress=> DROP INDEX partial_b_fn_idx;
regress=> CREATE INDEX partial_b_fn_in_idx ON partial(b)
          WHERE (b IN (identity(b), 1));
regress=> EXPLAIN SELECT b FROM partial WHERE b % 1000 = 0;
                               QUERY PLAN                                 
----------------------------------------------------------------------------
 Seq Scan on partial  (cost=10000000000.00..10000000195.00 rows=50 width=4)

So my advice? Rewrite IN as an OR list:

CREATE INDEX ix_issue_magazine_volume ON issue USING BTREE ( magazine, volume ) 
  WHERE (volume = f_current_volume(magazine) OR volume = f_previous_volume(magazine));

... and on a current version it might just work, so long as you keep the immutability rules outlined above in mind. Well, the second version:

select * from issue where magazine = 'gq' and volume = f_current_volume('gq');

might. Update: No, it won't; for it to be used, Pg would have to recognise that magazine='gq' and realise that f_current_volume('gq') was therefore equiavalent to f_current_volume(magazine). It doesn't attempt to prove equivalences on that level with partial index matching, so as you've noted in your update you have to write f_current_volume(magazine) directly. I should've spotted that. In theory PostgreSQL could use the index with the second query if the planner was smart enough, but I'm not sure how you'd go about efficiently looking for places where a substitution like this would be worthwhile.

The first example, volume = 100 will never use the index, since at query planning time PostgreSQL has no idea that f_current_volumne('gg'); will evaluate to 100. You could add an OR clause OR volume = 100 to your partial index WHERE clause and PostgreSQL would figure it out then, though.




回答2:


First off, I'd like to volunteer a wild guess, because you're making it sound like your f_current_volume() function calculates something based on a separate table.

If so, be wary because this means your function volatile, in that it needs to be recalculated on every call (a concurrent transaction might be inserting, updating or deleting rows). Postgres won't allow to index those, and I presume you worked around this by declaring the function immutable. Not only is this incorrect, but you also run into the issue of the index containing garbage, because the function gets evaluated as you edit the row, rather than at run time. What you'd probably want instead -- again if my guess is correct -- is to store and maintain the totals in the table itself using triggers.

Regarding your specific question, partial indexes need to have their where condition be met in the query to prompt Postgres to use them. I'm quite sure that Postgres is smart enough to identify that e.g. 10 is between 5 and 15 and use a partial index with that clause. I'm very suspicious that it would know that f_current_volume('gq') is 100 in your case, however, considering the above-mentioned caveat.

You could try this query and see if the index gets used:

select *
  from issue
 where magazine = 'gq'
   and volume in (f_current_volume('gq'), f_previous_volume('gq'));

(Though again, if your function is in fact volatile, you'll get a seq scan as well.)



来源:https://stackoverflow.com/questions/17116795/can-postgres-use-a-function-in-a-partial-index-where-clause

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!