Unnesting a list of JSON objects in PostgreSQL

时光怂恿深爱的人放手 提交于 2019-12-13 02:16:54

问题


I have a TEXT column in my PostgreSQL (9.6) database containing a list of one or more dictionnaries, like those ones.

[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]

or

[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]

I would like to retrieve each dictionnary from the column and parse them in different columns.

For this example:

id | customer | blurb
---+----------+------
 1 | Joe      | [{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]
 2 | Sally    | [{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]

would become:

id | customer | line_total_excl_vat  | account |  product_id | ...
---+----------+----------------------+---------+------------
 1 | Joe      | 583.3300             |     null|      5532548  
 2 | Sally    | 500.0000             |     null|         null
 3 | Sally    | 250.0000             |     null|      5532632

回答1:


if you know beforehand what fields you want to extract, cast the text to json / jsonb & use json_to_recordset / jsonb_to_recordset. Note that this method requires the fields names / types to be explicitly be specified. Unspecified fields that are in the json dictionaries will not be extracted.

See official postgesql documentation on json-functions

self contained example:

WITH tbl (id, customer, dat) as ( values
     (1, 'Joe',
      '[{ "line_total_excl_vat": "583.3300"
        , "account": ""
        , "subtitle": ""
        , "product_id": 5532548
        , "price_per_unit": "583.3333"
        , "line_total_incl_vat": "700.0000"
        , "text": "PROD0008"
        , "amount": "1.0000"
        , "vat_rate": "20"}]')
    ,(2, 'Sally', 
      '[{ "line_total_excl_vat": "500.0000"
        , "account": ""
        , "subtitle": ""
        , "product_id": ""
        , "price_per_unit": "250.0000"
        , "line_total_incl_vat": "600.0000"
        , "text": "PROD003"
        , "amount": "2.0000"
        , "vat_rate": "20"}
      , { "line_total_excl_vat": "250.0000"
        , "account": ""
        , "subtitle": ""
        , "product_id": 5532632
        , "price_per_unit": "250.0000"
        , "line_total_incl_vat": "300.0000"
        , "text": "PROD005"
        , "amount": "1.0000"
        , "vat_rate": "20"}]')
)
SELECT id, customer, x.*
FROM tbl
   , json_to_recordset(dat::json) x 
        (  line_total_excl_vat numeric
         , acount text
         , subtitle text
         , product_id text
         , price_per_unit numeric
         , line_total_incl_vat numeric
         , "text" text
         , amount numeric
         , vat_rate numeric
        )

produces the following output:

id    customer    line_total_excl_vat    acount    subtitle    product_id    price_per_unit    line_total_incl_vat    text      amount    vat_rate
 1    Joe                      583.33                             5532548    583.3333                          700    PROD0008       1          20
 2    Sally                       500                                             250                          600    PROD003        2          20
 2    Sally                       250                             5532632         250                          300    PROD005        1          20

This format is often referred to as the wide format.

It is also possible to extract the data in a long format, which has the additional benefit that it keeps all the data without explicitly mentioning the field names. In this case, the query may be written as (the test data is elided for brevity)

SELECT id, customer, y.key, y.value, x.record_number
FROM tbl
   , lateral json_array_elements(dat::json) WITH ORDINALITY AS x (val, record_number)
   , lateral json_each_text(x.val) y

The with ordinality in the above statement adds a sequence number for each element in the unnested array, and is be used to disambiguate fields from different arrays for each customer.

This produced the output:

id  customer key                    value     record_number
1   Joe      line_total_excl_vat    583.3300  1
1   Joe      account                          1
1   Joe      subtitle                         1
1   Joe      product_id             5532548   1
1   Joe      price_per_unit         583.3333  1
1   Joe      line_total_incl_vat    700.0000  1
1   Joe      text                   PROD0008  1
1   Joe      amount                 1.0000    1
1   Joe      vat_rate               20        1
2   Sally    line_total_excl_vat    500.0000  1
2   Sally    account                          1
2   Sally    subtitle                         1
2   Sally    product_id                       1
2   Sally    price_per_unit         250.0000  1
2   Sally    line_total_incl_vat    600.0000  1
2   Sally    text                   PROD003   1
2   Sally    amount                 2.0000    1
2   Sally    vat_rate               20        1
2   Sally    line_total_excl_vat    250.0000  2
2   Sally    account                          2
2   Sally    subtitle                         2
2   Sally    product_id             5532632   2
2   Sally    price_per_unit         250.0000  2
2   Sally    line_total_incl_vat    300.0000  2
2   Sally    text                   PROD005   2
2   Sally    amount                 1.0000    2
2   Sally    vat_rate               20        2



回答2:


Tidying up the json field would help a little bit. And that's something it could be done before inserting data into the table.

However, following your example, the code below should work:

create table public.yourtable (id integer, name varchar, others varchar);
insert into public.yourtable select 1,'Joe','[{"line_total_excl_vat": "583.3300", "account": "", "subtitle": "", "product_id": 5532548, "price_per_unit": "583.3333", "line_total_incl_vat": "700.0000", "text": "PROD0008", "amount": "1.0000", "vat_rate": "20"}]';
insert into public.yourtable select 2,'Sally','[{"line_total_excl_vat": "500.0000", "account": "", "subtitle": "", "product_id": "", "price_per_unit": "250.0000", "line_total_incl_vat": "600.0000", "text": "PROD003", "amount": "2.0000", "vat_rate": "20"}, {"line_total_excl_vat": "250.0000", "account": "", "subtitle": "", "product_id": 5532632, "price_per_unit": "250.0000", "line_total_incl_vat": "300.0000", "text": "PROD005", "amount": "1.0000", "vat_rate": "20"}]';

with jsonb_table as (
select id, name,
('{'||regexp_replace(
unnest(string_to_array(others, '}, {')),
'\[|\]|\{|\}','','g')::varchar||'}')::jsonb as jsonb_data
from yourtable
)
select id,name, * from jsonb_table,
jsonb_to_record(jsonb_data) 
     as (line_total_excl_vat numeric,account varchar, subtitle varchar, product_id varchar, price_per_unit numeric, line_total_incl_vat numeric);

First we create the jsonb_table where we transform your dictionary field into a postgres jsonb field by:

1) converting the string into an array by splitting in the '}, {' character sequence

2) unnesting the array elements to rows

3) cleanning up '[]{}' characters and converting the string to jsonb

And then we make use of the jsonb_to_record function to convert the jsonb records into columns. There we have to specify as many fields as needed for the column definitions.



来源:https://stackoverflow.com/questions/51045754/unnesting-a-list-of-json-objects-in-postgresql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!