I need to delete the majority (say, 90%) of a very large table (say, 5m rows). The other 10% of this table is frequently read, but not written to.
From \"Best way to
Indexes are typically useless for operations on 90% of all rows. Sequential scans will be faster either way. (Exotic exceptions apply.)
If you need to allow concurrent reads, you cannot take an exclusive lock on the table. So you also cannot drop any indexes in the same transaction.
You could drop indexes in separate transactions to keep the duration of the exclusive lock at a minimum. In Postgres 9.2 or later you can also use DROP INDEX CONCURRENTLY, which only needs minimal locks. Later use CREATE INDEX CONCURRENTLY
to rebuild the index in the background - and only take a very brief exclusive lock.
If you have a stable condition to identify the 10 % (or less) of rows that stay, I would suggest a partial index on just those rows to get the best for both:
DELETE
is not going to modify the partial index at all, since none of the rows are involved in the DELETE
.CREATE INDEX foo (some_id) WHERE delete_flag = FALSE;
Assuming delete_flag
is boolean
. You have to include the same predicate in your queries (even if it seems logically redundant) to make sure Postgres can the partial index.