I try to search a solution but I didn't find anything for my case...
Here is the database declaration (simplified):
CREATE TABLE documents (
document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
data_block jsonb NULL
);
And this is an example of insert.
INSERT INTO documents (document_id, data_block)
VALUES(878979,
{"COMMONS": {"DATE": {"value": "2017-03-11"}},
"PAYABLE_INVOICE_LINES": [
{"AMOUNT": {"value": 52408.53}},
{"AMOUNT": {"value": 654.23}}
]});
INSERT INTO documents (document_id, data_block)
VALUES(977656,
{"COMMONS": {"DATE": {"value": "2018-03-11"}},
"PAYABLE_INVOICE_LINES": [
{"AMOUNT": {"value": 555.10}}
]});
I want to search all documents where one of the PAYABLE_INVOICE_LINES has a line with a value greater than 1000.00
My query is
select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
where (pil->'AMOUNT'->>'value')::decimal >= 1000
But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.
With millions of documents, this query is very expensive... 10 seconds with 1 million.
Do you have some ideas to have better performance ?
Thanks
Instead of cross join lateral
use where exists
:
select *
from documents d
where exists (
select 1
from jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
where (pil->'AMOUNT'->>'value')::decimal >= 1000)
limit 50;
Update
And yet another method, more complex but also much more efficient.
Create function that returns max value from your JSONB
data, like this:
create function fn_get_max_PAYABLE_INVOICE_LINES_value(JSONB) returns decimal language sql as $$
select max((pil->'AMOUNT'->>'value')::decimal)
from jsonb_array_elements($1 -> 'PAYABLE_INVOICE_LINES') as pil $$
Create index on this function:
create index idx_max_PAYABLE_INVOICE_LINES_value
on documents(fn_get_max_PAYABLE_INVOICE_LINES_value(data_block));
Use function in your query:
select *
from documents d
where fn_get_max_PAYABLE_INVOICE_LINES_value(data_block) > 1000
limit 50;
In this case the index will be used and query will be much faster on large amount of data.
PS: Usually limit
have sense in pair with order by
.
Grouping and limiting is easy enough:
select document_id
from documents d
cross join lateral
jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
where (pil->'AMOUNT'->>'value')::decimal >= 1000
group by
document_id
limit 50
If you query this more often, you could store a list of documents and invoice lines in a separate table. When you're adding, modifying or deleting documents, you'd have to keep the separate table up to date too. But querying a regular table is much faster than querying JSON columns.
来源:https://stackoverflow.com/questions/49586710/postgres-jsonb-search-in-array-with-greater-operator-with-jsonb-array-elements