De-normalize json object into flat objects

孤人 提交于 2020-12-26 12:09:47

问题


I have a json object like

 {
        "id": 3590403096656,
        "title": "Romania Special Zip Hoodie Blue - Version 02 A5",
        "tags": [
            "1ST THE WORLD FOR YOU <3",
            "apparel",
        ],
        "props": [
            {
                "id": 28310659235920,
                "title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            },
            {
                "id": 444444444444,
                "title": "number 2",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            }
        ]
}

i want to flatten it so desired output looks like

{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00",       "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}

so far i have tried:

from pandas.io.json import json_normalize
json_normalize(sample_object)

where sample_object contains json object, i am looping through a large file of such objects which i want to flatten in desired format.

json_normalize is not giving me desired output, i want to keep tags as it is but flatten props and repeat parent object info.


回答1:


please try this:

import copy

obj =  {
        "id": 3590403096656,
        "title": "Romania Special Zip Hoodie Blue - Version 02 A5",
        "tags": [
            "1ST THE WORLD FOR YOU <3",
            "apparel",
        ],
        "props": [
            {
                "id": 28310659235920,
                "title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            },
            {
                "id": 444444444444,
                "title": "number 2",
                "position": 1,
                "product_id": 3590403096656,
                "created_at": "2019-05-22T00:46:19+07:00",
                "updated_at": "2019-05-22T01:03:29+07:00"
            }
        ]
}

props = obj.pop("props")

for p in props:
    res = copy.deepcopy(obj)
    for k in p:
        res["props."+k] = p[k]
    print(res)

basically it use pop("props") to get the obj without "props" (which is the common part to use in all result objects),

then we iterate through props, and create new objects that contain the base object, and then fill "props.key" for every key in every prop.




回答2:


You want some json_normalize behavior, but with a custom twist. So use json_normalize or similar on a portion of the data, then combine it with the remainder of data.

The code below prefers the "or similar" route, reaching deep into the pandas codebase to get the nested_to_record helper function, which flattens dictionaries. It's used to create individual rows that combine the base data (keys/values common across all properties) with the flattened data specific to each props entry. There is a commented-out line that does the equivalent thing without nested_to_record, but it somewhat inelegantly flattens into a DataFrame, then exports out to a dict.

from collections import OrderedDict
import json
import pandas as pd
from pandas.io.json.normalize import nested_to_record

data = json.loads(rawjson)
props = data.pop('props')
rows = []
for prop in props:
    rowdict = OrderedDict(data)
    flattened_prop = nested_to_record({'props': prop})
    # flatteded_prop = json_normalize({'props': prop}).to_dict(orient='records')[0]
    rowdict.update(flattened_prop)
    rows.append(rowdict)

df = pd.DataFrame(rows)

Resulting in:



来源:https://stackoverflow.com/questions/56716636/de-normalize-json-object-into-flat-objects

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!