问题
I'm using the Kaminari gem to paginate a query on a large table (~1.5MM rows). While fetching the actual results pages is quite quick (~20ms), kaminari's added SELECT COUNT(*) WHERE ....
is excruciatingly slow, and adds several extra seconds to the execution time.
Is there a way to approximate the number of results instead?
回答1:
Quick estimate for whole table
For a very quick estimate for the whole table:
Your example hints at addresses. Say we have a table called adr
in the schema public
:
SELECT reltuples FROM pg_class WHERE oid = 'public.adr'::regclass;
More details in this related answer:
How do I speed up counting rows in a PostgreSQL table?
Count with condition(s)
For a count with a condition, Postgres can use indexes to make it faster. This has been improved with "covering indexes" in Postgres 9.2, but certain requirements have to be met to profit from that. More in the Postgres Wiki about Index-only scans.
For queries with conditions on city
and state
, this multicolumn index would help a lot, if the conditions are selective (only a small percentage of the rows meet the condition):
CREATE INDEX adr_foo_idx ON adr (city, state);
If you have a small set of typical conditions, you might even use partial indexes:
CREATE INDEX adr_ny_ny_idx ON adr(adr_id)
WHERE city = 'New York'
AND state = 'NY';
... one for every set of (state, city)
Or a combination of both:
CREATE INDEX adr_ny_idx ON adr (city)
WHERE state = 'NY';
... one per state
Normalize
Of course, everything to make your big table (and indexes) smaller helps. Lookup tables for cities and cities would go a long way to cut down on redundant storage. The key word here is normalization.
Instead of:
CREATE TABLE adr (
adr_id serial PRIMARY KEY
,state text
,city text
...
);
SELECT count(*)
FROM adr
WHERE city = 'New York'
AND state = 'NY';
Normalize your database design and use proper indexes:
CREATE TABLE state (
state_id serial PRIMARY KEY
,state text UNIQUE
);
CREATE TABLE city (
city_id serial PRIMARY KEY
,state_id int REFERENCES state
,city text
,UNIQUE (state_id, city)
);
CREATE TABLE adr (
adr_id serial PRIMARY KEY
city_id int REFERENCES city
...
);
CREATE INDEX adr_city_idx ON adr (city_id);
SELECT count(*)
FROM state s
JOIN city c USING (state_id)
JOIN adr a USING (city_id)
WHERE s.state = 'NY'
AND c.city = 'New York'
Table and index become smaller. Integer handling is faster than text. Everything becomes faster.
Materialized view
On top of that, if performance is crucial, and since you do not need exact counts, you could use a materialized view with counts for relevant conditions. Refresh the view at events or times of your choosing to keep numbers up to date. Follow the link to the manual for details. Requires Postgres 9.3, but you can easily implement it manually in any version.
来源:https://stackoverflow.com/questions/21839330/kaminari-is-slow-with-count-on-a-huge-table-in-postgres