问题
I try to set up a GIN index but I do not think my index is used when I run the request, whether I use an operator or a function.
Environment
In our table we have a JSONB field (json_aip
) containing a Json that looks like that:
{
"properties": {
"pdi": {
"contextInformation": {
"tags": ["SOME_TAG"]
},
},
}
Table creation :
create table t_aip (
json_aip jsonb,
[...]
);
CREATE INDEX idx_aip_tags
ON t_aip
USING gin ((json_aip -> 'properties' -> 'pdi' -> 'contextInformation' -> 'tags'));
Operator query
We can't use the operator ?|
as we use JDBC. But rumors indicate I should see my index when I run that type of query.
EXPLAIN ANALYZE SELECT count(*)
FROM storage.t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']
Result:
Aggregate
(cost=27052.16..27052.17 rows=1 width=8) (actual time=488.085..488.087 rows=1 loops=1)
-> Seq Scan on t_aip (cost=0.00..27052.06 rows=42 width=0) (actual time=0.134..456.978 rows=16502 loops=1)
Filter: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
Rows Removed by Filter: 17511
Planning time: 23.202 ms
Execution
time: 488.449 ms
Functional query
EXPLAIN ANALYZE SELECT count(*)
FROM storage.t_aip
WHERE jsonb_exists_any(
json_aip#>'{properties,pdi,contextInformation,tags}',
array['SOME_TAG']
)
Result:
QUERY PLAN
Aggregate (cost=27087.00..27087.01 rows=1 width=8) (actual time=369.931..369.933 rows=1 loops=1)
-> Seq Scan on t_aip (cost=0.00..27052.06 rows=13979 width=0) (actual time=0.173..350.437 rows=16502 loops=1)
Filter: jsonb_exists_any((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]), '{SOME_TAG}'::text[])
Rows Removed by Filter: 17511
Planning time: 56.021 ms
Execution time: 370.252 ms
There is nothing about the index at all. Any help would be much appreciated !
I think my index is wrong because it considers that at the end of the path json_aip -> 'properties' -> 'pdi' -> 'contextInformation' -> 'tags'
it index a String whether that's an array. That's my opinion.
回答1:
There is a general rule that you have to use exactly the same expression both in an index and a query to use the index. With this index:
CREATE INDEX idx_aip_tags
ON t_aip
USING gin ((json_aip#>'{properties,pdi,contextInformation,tags}'));
the query will use the index
EXPLAIN ANALYZE
SELECT count(*)
FROM t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' ?| array['SOME_TAG']
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=149.97..149.98 rows=1 width=0) (actual time=27.783..27.783 rows=1 loops=1)
-> Bitmap Heap Scan on t_aip (cost=20.31..149.87 rows=40 width=0) (actual time=1.504..25.726 rows=20000 loops=1)
Recheck Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
Heap Blocks: exact=345
-> Bitmap Index Scan on idx_aip_tags (cost=0.00..20.30 rows=40 width=0) (actual time=1.455..1.455 rows=20000 loops=1)
Index Cond: ((json_aip #> '{properties,pdi,contextInformation,tags}'::text[]) ?| '{SOME_TAG}'::text[])
Note that the GIN index supports also @>
operator:
SELECT count(*)
FROM t_aip
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG"]'
but be careful when searching for more than one tag:
SELECT count(*)
FROM t_aip
-- this gives objects containing both tags:
-- WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> '["SOME_TAG", "ANOTHER_TAG"]'
-- and this gives objects with any of two tags:
WHERE json_aip#>'{properties,pdi,contextInformation,tags}' @> ANY(ARRAY['["SOME_TAG"]', '["ANOTHER_TAG"]']::jsonb[])
回答2:
EDITED:
I thought the opposite, but in fact THERE IS a difference between operators (?|
) and functions (jsonb_exists_any
) about index usage, as the index is never used when the query uses (jsonb) functions.
You can get more about it here: https://dba.stackexchange.com/a/91007
That was the other question of this topic.
来源:https://stackoverflow.com/questions/52850061/indexing-a-jsonb-array-in-postgres