Add datetime constraint to a PostgreSQL multi-column partial index

让人想犯罪 __ 提交于 2019-12-22 05:27:27

问题


I've got a PostgreSQL table called queries_query, which has many columns.

Two of these columns, created and user_sid, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.

Here is my question:

I've currently created my multi-column index on these two columns by running:

CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)

But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:

CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`

But this throws an exception stating that my function must be immutable.

I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.


回答1:


You get an exception in your attempt to use now() because the function is not IMMUTABLE (obviously) and, I quote the manual here:

All functions and operators used in an index definition must be "immutable" ...

I see two ways to utilize a (much more efficient) partial index here:

1. Partial index with condition using constant date:

CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;

Assuming created is actually defined as timestamp. It wouldn't work to provide a timestamp constant for a timestamptz column (timestamp with time zone). The cast from timestamp to timestamptz (or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:

  • Ignoring timezones altogether in Rails and PostgreSQL

Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.

Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:

CREATE OR REPLACE FUNCTION f_index_recreate()
  RETURNS void AS
$func$
BEGIN
   DROP INDEX IF EXISTS queries_recent_idx;
   EXECUTE format('
      CREATE INDEX queries_recent_idx
      ON queries_query (user_sid, created)
      WHERE created > %L::timestamp'
    , LOCALTIMESTAMP - interval '30 days');  -- timestamp constant
--  , now() - interval '30 days');           -- alternative for timestamptz
END
$func$  LANGUAGE plpgsql;

Call:

SELECT f_index_recreate();

now() (like you had) is the equivalent of CURRENT_TIMESTAMP and returns timestamptz. Cast to timestamp with now()::timestamp or use LOCALTIMESTAMP instead.

  • Select today's (since midnight) timestamps only

Tested with Postgres 9.2 - 9.4.
SQL Fiddle.


If you have to deal with concurrent access, use CREATE INDEX CONCURRENTLY. But you can't wrap this command into a function because, per documentation:

... a regular CREATE INDEX command can be performed within a transaction block, but CREATE INDEX CONCURRENTLY cannot.

So, with two separate transactions:

CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE  created > '2013-01-07 00:00'::timestamp;  -- your new condition

Then:

DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;

Optionally, rename to old name:

ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;

2. Partial index with condition on "archived" tag

Add an archived tag to your table:

ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;

UPDATE the column at intervals of your choosing to "retire" older rows and create an index like:

CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;

Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.

You don't have to drop and recreate the index, but the UPDATE on the table may be more expensive than index recreation and the table gets slightly bigger.

I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.

Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.



来源:https://stackoverflow.com/questions/14744931/add-datetime-constraint-to-a-postgresql-multi-column-partial-index

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!