Postgres `gin_trgm_ops` index not being used

℡╲_俬逩灬. 提交于 2019-12-11 15:56:53

问题


I'm trying to speed up some text matching in Postgres, using the pg_trgm extensions:

CREATE TABLE test3 (id bigint, key text, value text);

insert into test3 values (1, 'first 1', 'second 3');
insert into test3 values (2, 'first 1', 'second 2');
insert into test3 values (2, 'first 2', 'second 3');
insert into test3 values (3, 'first 1', 'second 2');
insert into test3 values (3, 'first 1', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 1', 'second 2');
insert into test3 values (4, 'first 1', 'second 2');

-- repeat the above 1,000,000x times, to have more rows for benchmarking
insert into test3(id, key, value) select id, key, value from test3 cross join generate_series(1, 1000000);

Now I query this table with ILIKE:

select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
Time: 918.265 ms

To see if indexing would speed this up, I added pg_trgm on both key and value columns:

CREATE extension if not exists pg_trgm;
CREATE INDEX test3_key_trgm_idx ON test3 USING gin (key gin_trgm_ops);
CREATE INDEX test3_value_trgm_idx ON test3 USING gin (value gin_trgm_ops);

But the query still takes the same time, and EXPLAIN ANALYZE shows the indexes are not being used at all:

explain analyze select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=126905.14..126905.15 rows=1 width=8) (actual time=1017.666..1017.667 rows=1 loops=1)
   ->  Gather  (cost=126904.93..126905.14 rows=2 width=8) (actual time=1017.505..1018.778 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=125904.93..125904.94 rows=1 width=8) (actual time=1010.862..1010.862 rows=1 loops=3)
               ->  Parallel Seq Scan on test3  (cost=0.00..122427.06 rows=1391148 width=0) (actual time=0.041..973.550 rows=666667 loops=3)
                     Filter: ((value ~~* '%nd 3%'::text) AND (key = 'first 1'::text))
                     Rows Removed by Filter: 2333336
 Planning Time: 0.266 ms
 Execution Time: 1018.814 ms

Time: 1049.413 ms (00:01.049)

Note the sequential scan. What gives?


回答1:


Never mind, I found the issue.

The query planner was smarter than my toy test set; seeing as most rows match the query, it went for a sequential scan.

If I try with ilike '%nd 0%' instead, no rows match and EXPLAIN ANALYZE reports Bitmap Index Scan on test3_value_trgm_idx correctly.

So, normalizing the original JSONB in this manner works. But I'll also try to find and compare another way, using regular expressions over TEXT, to avoid having to create and maintain another table.



来源:https://stackoverflow.com/questions/56485455/postgres-gin-trgm-ops-index-not-being-used

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!