问题
I've got a PostgreSQL table called queries_query
, which has many columns.
Two of these columns, created
and user_sid
, are frequently used together in SQL queries by my application to determine how many queries a given user has done over the past 30 days. It is very, very rare that I query these stats for any time older than the most recent 30 days.
Here is my question:
I've currently created my multi-column index on these two columns by running:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
But I'd like to further restrict the index to only care about those queries in which the created date is within the past 30 days. I've tried doing the following:
CREATE INDEX CONCURRENTLY some_index_name ON queries_query (user_sid, created)
WHERE created >= NOW() - '30 days'::INTERVAL`
But this throws an exception stating that my function must be immutable.
I'd love to get this working so that I can optimize my index, and cut back on the resources Postgres needs to do these repeated queries.
回答1:
You get an exception in your attempt to use now()
because the function is not IMMUTABLE
(obviously) and, I quote the manual here:
All functions and operators used in an index definition must be "immutable" ...
I see two ways to utilize a (much more efficient) partial index here:
1. Partial index with condition using constant date:
CREATE INDEX queries_recent_idx ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp;
Assuming created
is actually defined as timestamp
. It wouldn't work to provide a timestamp
constant for a timestamptz
column (timestamp with time zone
). The cast from timestamp
to timestamptz
(or vice versa) depends on the current time zone setting and is not immutable. Use a constant of matching data type. Understand the basics of timestamps with / without time zone:
- Ignoring timezones altogether in Rails and PostgreSQL
Drop and recreate that index at hours with low traffic, maybe with a cron job on a daily or weekly basis (or whatever is good enough for you). Creating an index is pretty fast, especially a partial index that is comparatively small. This solution also doesn't need to add anything to the table.
Assuming no concurrent access to the table, automatic index recreation could be done with a function like this:
CREATE OR REPLACE FUNCTION f_index_recreate()
RETURNS void AS
$func$
BEGIN
DROP INDEX IF EXISTS queries_recent_idx;
EXECUTE format('
CREATE INDEX queries_recent_idx
ON queries_query (user_sid, created)
WHERE created > %L::timestamp'
, LOCALTIMESTAMP - interval '30 days'); -- timestamp constant
-- , now() - interval '30 days'); -- alternative for timestamptz
END
$func$ LANGUAGE plpgsql;
Call:
SELECT f_index_recreate();
now()
(like you had) is the equivalent of CURRENT_TIMESTAMP
and returns timestamptz
. Cast to timestamp
with now()::timestamp
or use LOCALTIMESTAMP
instead.
- Select today's (since midnight) timestamps only
Tested with Postgres 9.2 - 9.4.
SQL Fiddle.
If you have to deal with concurrent access, use CREATE INDEX CONCURRENTLY
. But you can't wrap this command into a function because, per documentation:
... a regular
CREATE INDEX
command can be performed within a transaction block, butCREATE INDEX CONCURRENTLY
cannot.
So, with two separate transactions:
CREATE INDEX CONCURRENTLY queries_recent_idx2 ON queries_query (user_sid, created)
WHERE created > '2013-01-07 00:00'::timestamp; -- your new condition
Then:
DROP INDEX CONCURRENTLY IF EXISTS queries_recent_idx;
Optionally, rename to old name:
ALTER INDEX queries_recent_idx2 RENAME TO queries_recent_idx;
2. Partial index with condition on "archived" tag
Add an archived
tag to your table:
ALTER queries_query ADD COLUMN archived boolean NOT NULL DEFAULT FALSE;
UPDATE
the column at intervals of your choosing to "retire" older rows and create an index like:
CREATE INDEX some_index_name ON queries_query (user_sid, created)
WHERE NOT archived;
Add a matching condition to your queries (even if it seems redundant) to allow it to use the index. Check with EXPLAIN ANALYZE
whether the query planner catches on - it should be able to use the index for queries on an newer date. But it won't understand more complex conditions not matching exactly.
You don't have to drop and recreate the index, but the UPDATE
on the table may be more expensive than index recreation and the table gets slightly bigger.
I would go with the first option (index recreation). In fact, I am using this solution in several databases. The second incurs more costly updates.
Both solutions retain their usefulness over time, performance slowly deteriorates as more outdated rows are included in the index.
来源:https://stackoverflow.com/questions/14744931/add-datetime-constraint-to-a-postgresql-multi-column-partial-index