Postgres jsonb search in array with greater operator (with jsonb_array_elements)

雨燕双飞 提交于 2020-01-02 12:20:25

问题


I try to search a solution but I didn't find anything for my case...

Here is the database declaration (simplified):

CREATE TABLE documents (
    document_id int4 NOT NULL GENERATED BY DEFAULT AS IDENTITY,
    data_block jsonb NULL
);

And this is an example of insert.

INSERT INTO documents (document_id, data_block)
VALUES(878979, 
    {"COMMONS": {"DATE": {"value": "2017-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 52408.53}}, 
         {"AMOUNT": {"value": 654.23}}
     ]});
INSERT INTO documents (document_id, data_block)
VALUES(977656, 
    {"COMMONS": {"DATE": {"value": "2018-03-11"}},
     "PAYABLE_INVOICE_LINES": [
         {"AMOUNT": {"value": 555.10}}
     ]});

I want to search all documents where one of the PAYABLE_INVOICE_LINES has a line with a value greater than 1000.00

My query is

select *
from documents d
cross join lateral jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where (pil->'AMOUNT'->>'value')::decimal >= 1000

But, as I want to limit to 50 documents, I have to group on the document_id and limit the result to 50.

With millions of documents, this query is very expensive... 10 seconds with 1 million.

Do you have some ideas to have better performance ?

Thanks


回答1:


Instead of cross join lateral use where exists:

select *
from documents d
where exists (
  select 1
  from jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil
  where (pil->'AMOUNT'->>'value')::decimal >= 1000)
limit 50;

Update

And yet another method, more complex but also much more efficient.

Create function that returns max value from your JSONB data, like this:

create function fn_get_max_PAYABLE_INVOICE_LINES_value(JSONB) returns decimal language sql as $$
  select max((pil->'AMOUNT'->>'value')::decimal)
  from jsonb_array_elements($1 -> 'PAYABLE_INVOICE_LINES') as pil $$

Create index on this function:

create index idx_max_PAYABLE_INVOICE_LINES_value
  on documents(fn_get_max_PAYABLE_INVOICE_LINES_value(data_block));

Use function in your query:

select *
from documents d
where fn_get_max_PAYABLE_INVOICE_LINES_value(data_block) > 1000
limit 50;

In this case the index will be used and query will be much faster on large amount of data.

PS: Usually limit have sense in pair with order by.




回答2:


Grouping and limiting is easy enough:

select  document_id
from    documents d
cross join lateral 
        jsonb_array_elements(d.data_block -> 'PAYABLE_INVOICE_LINES') as pil 
where   (pil->'AMOUNT'->>'value')::decimal >= 1000
group by
        document_id
limit   50

If you query this more often, you could store a list of documents and invoice lines in a separate table. When you're adding, modifying or deleting documents, you'd have to keep the separate table up to date too. But querying a regular table is much faster than querying JSON columns.



来源:https://stackoverflow.com/questions/49586710/postgres-jsonb-search-in-array-with-greater-operator-with-jsonb-array-elements

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!