PostgreSQL GIN index slower than GIST for pg_trgm?

。_饼干妹妹 提交于 2019-12-21 05:42:48

问题


Despite what all the documentation says, I'm finding GIN indexes to be significantly slower than GIST indexes for pg_trgm related searches. This is on a table of 25 million rows with a relatively short text field (average length of 21 characters). Most of the rows of text are addresses of the form "123 Main st, City".

GIST index takes about 4 seconds with a search like

select suggestion from search_suggestions where suggestion % 'seattle';

But GIN takes 90 seconds and the following result when running with EXPLAIN ANALYZE:

Bitmap Heap Scan on search_suggestions  (cost=330.09..73514.15 rows=25043 width=22) (actual time=671.606..86318.553 rows=40482 loops=1)
  Recheck Cond: ((suggestion)::text % 'seattle'::text)
  Rows Removed by Index Recheck: 23214341
  Heap Blocks: exact=7625 lossy=223807
  ->  Bitmap Index Scan on tri_suggestions_idx  (cost=0.00..323.83 rows=25043 width=0) (actual time=669.841..669.841 rows=1358175 loops=1)
        Index Cond: ((suggestion)::text % 'seattle'::text)
Planning time: 1.420 ms
Execution time: 86327.246 ms

Note that over a million rows are being selected by the index, even though only 40k rows actually match. Any ideas why this is performing so poorly? This is on PostgreSQL 9.4.


回答1:


Some issues stand out:

First, consider upgrading to a current version of Postgres. At the time of writing that's pg 9.6 or pg 10 (currently beta). Since Pg 9.4 there have been multiple improvements for GIN indexes, the additional module pg_trgm and big data in general.

Next, you need much more RAM, in particular a higher work_mem setting. I can tell from this line in the EXPLAIN output:

Heap Blocks: exact=7625 lossy=223807

"lossy" in the details for a Bitmap Heap Scan (with your particular numbers) indicates a dramatic shortage of work_mem. Postgres only collects block addresses in the bitmap index scan instead of row pointers because that's expected to be faster with your low work_mem setting (can't hold exact addresses in RAM). Many more non-qualifying rows have to be filtered in the following Bitmap Heap Scan this way. This related answer has details:

  • “Recheck Cond:” line in query plans with a bitmap index scan

But don't set work_mem too high without considering the whole situation:

  • Optimize simple query using ORDER BY date and text

There may other problems, like index or table bloat or more configuration bottlenecks. But if you fix just these two items, the query should be much faster already.

Also, do you really need to retrieve all 40k rows in the example? You probably want to add a small LIMIT to the query and make it a "nearest-neighbor" search - in which case a GiST index is the better choice after all, because that is supposed to be faster with a GiST index. Example:

  • Best index for similarity function


来源:https://stackoverflow.com/questions/43008382/postgresql-gin-index-slower-than-gist-for-pg-trgm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!