How to parse deeply nested JSON to pandas dataframe?

◇◆丶佛笑我妖孽 提交于 2021-02-18 18:14:49

问题


Below is the code that parses the following nested jsons to corresponding pandas dataframe :

import pandas as pd

def flatten_json(nested_json):
    """
        Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
        Returns:
            The flattened json object if successful, None otherwise.
    """
    out = {}

    def flatten(x, name=''):
        if type(x) is dict:
            for a in x:
                flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out

simplejson = False
if(isinstance(sample_object2, list)):
    dict_flattened = [flatten_json(d) for d in sample_object2]
elif isinstance(sample_object2, dict):
    while isinstance(sample_object2, dict) & simplejson == False:
        for key in sample_object2.keys():
            nodekey = key
            if isinstance(sample_object2[nodekey], dict) | isinstance(sample_object2[nodekey], list):
                dict_flattened = [flatten_json(d) for d in sample_object2[nodekey]]
                sample_object2 = sample_object2[nodekey]
                break
            else:
                dict_flattened = flatten_json(sample_object2)
                simplejson = True
                break
        break
else:
    print("Invalid json")

if simplejson:
    pdf = pd.DataFrame(dict_flattened, index=[0])
else:
    pdf = pd.DataFrame(dict_flattened)

Input 1 :

sample_object2 = {
        "node":[
            {
                "item_1":"value_11",
                "item_2":"value_12",
                "item_3":"value_13",
                "item_4":["sub_value_14", "sub_value_15"],
                "item_5":{
                    "sub_item_1":"sub_item_value_11",
                    "sub_item_2":["sub_item_value_12", "sub_item_value_13"]
                }
            },
            {
                "item_1":"value_21",
                "item_2":"value_22",
                "item_4":["sub_value_24", "sub_value_25"],
                "item_5":{
                    "sub_item_1":"sub_item_value_21",
                    "sub_item_2":["sub_item_value_22", "sub_item_value_23"]
                }
            }
        ]
    }

Output 1:

+--------+--------+--------+------------+------------+-----------------+-------------------+-------------------+
|item_1  |item_2  |item_3  |item_4_0    |item_4_1    |item_5_sub_item_1|item_5_sub_item_2_0|item_5_sub_item_2_1|
+--------+--------+--------+------------+------------+-----------------+-------------------+-------------------+
|value_11|value_12|value_13|sub_value_14|sub_value_15|sub_item_value_11|sub_item_value_12  |sub_item_value_13  |
|value_21|value_22|nan     |sub_value_24|sub_value_25|sub_item_value_21|sub_item_value_22  |sub_item_value_23  |
+--------+--------+--------+------------+------------+-----------------+-------------------+-------------------+

Input 2 :

sample_object2 = {
                "item_1":"value_11",
                "item_2":"value_12",
                "item_5":{
                    "sub_item_1":"sub_item_value_11",
                    "sub_item_2":["sub_item_value_12", "sub_item_value_13"]
                }
}

Output 2:

+--------+--------+-----------------+-------------------+-------------------+
|item_1  |item_2  |item_5_sub_item_1|item_5_sub_item_2_0|item_5_sub_item_2_1|
+--------+--------+-----------------+-------------------+-------------------+
|value_11|value_12|sub_item_value_11|sub_item_value_12  |sub_item_value_13  |
+--------+--------+-----------------+-------------------+-------------------+

Input 3 :

sample_object2 = {
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "image":
        {
            "url": "images/0001.jpg",
            "width": 200,
            "height": 200
        },
    "thumbnail":
        {
            "url": "images/thumbnails/0001.jpg",
            "width": 32,
            "height": 32
        }
}

Output 3:

+----+-----+----+---------------+-----------+------------+--------------------------+---------------+----------------+
|id  |type |name|image_url      |image_width|image_height|thumbnail_url             |thumbnail_width|thumbnail_height|
+----+-----+----+---------------+-----------+------------+--------------------------+---------------+----------------+
|0001|donut|Cake|images/0001.jpg|200        |200         |images/thumbnails/0001.jpg|32             |32              |
+----+-----+----+---------------+-----------+------------+--------------------------+---------------+----------------+

It works as expected for the above nested jsons. But for deeply nested jsons, the code doesnt work.

Input :

sample_object2 = {
    "id": "0001",
    "type": "donut",
    "name": "Cake",
    "ppu": 0.55,
    "batters":
        {
            "batter":
                [
                    { "id": "1001", "type": "Regular" },
                    { "id": "1002", "type": "Chocolate" },
                    { "id": "1003", "type": "Blueberry" },
                    { "id": "1004", "type": "Devil's Food" }
                ]
        },
    "topping":
        [
            { "id": "5001", "type": "None" },
            { "id": "5002", "type": "Glazed" },
            { "id": "5005", "type": "Sugar" },
            { "id": "5007", "type": "Powdered Sugar" },
            { "id": "5006", "type": "Chocolate with Sprinkles" },
            { "id": "5003", "type": "Chocolate" },
            { "id": "5004", "type": "Maple" }
        ]
}

Expected Output:

+----+-----+----+----+-----------------+-------------------+----------+------------------------+
|id  |type |name|ppu |batters_batter_id|batters_batter_type|topping_id|topping_type            |
+----+-----+----+----+-----------------+-------------------+----------+------------------------+
|0001|donut|cake|0.55|1001             |Regular            |5001      |None                    |
|nan |nan  |nan |nan |1002             |Chocolate          |5002      |Glazed                  |
|nan |nan  |nan |nan |1003             |Blueberry          |5005      |Sugar                   |
|nan |nan  |nan |nan |1004             |Devil's Food       |5007      |Powdered Sugar          |
|nan |nan  |nan |nan |nan              |nan                |5006      |Chocolate with Sprinkles|
|nan |nan  |nan |nan |nan              |nan                |5003      |Chocolate               |
|nan |nan  |nan |nan |nan              |nan                |5004      |Maple                   |
+----+-----+----+----+-----------------+-------------------+----------+------------------------+

But Output was:

+----+-----+----+----+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+------------+--------------+------------+--------------+------------+--------------+------------+--------------+------------+------------------------+------------+--------------+------------+--------------+
|id  |type |name|ppu |batters_batter_0_id|batters_batter_0_type|batters_batter_1_id|batters_batter_1_type|batters_batter_2_id|batters_batter_2_type|batters_batter_3_id|batters_batter_3_type|topping_0_id|topping_0_type|topping_1_id|topping_1_type|topping_2_id|topping_2_type|topping_3_id|topping_3_type|topping_4_id|topping_4_type          |topping_5_id|topping_5_type|topping_6_id|topping_6_type|
+----+-----+----+----+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+------------+--------------+------------+--------------+------------+--------------+------------+--------------+------------+------------------------+------------+--------------+------------+--------------+
|0001|donut|Cake|0.55|1001               |Regular              |1002               |Chocolate            |1003               |Blueberry            |1004               |Devil's Food         |5001        |None          |5002        |Glazed        |5005        |Sugar         |5007        |Powdered Sugar|5006        |Chocolate with Sprinkles|5003        |Chocolate     |5004        |Maple         |
+----+-----+----+----+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+------------+--------------+------------+--------------+------------+--------------+------------+--------------+------------+------------------------+------------+--------------+------------+--------------+

How to write a generic code that works for all kinds/levels of nested json? I tried tweaking the above code but couldn't do it. Any solution to this would be highly appreciated.

来源:https://stackoverflow.com/questions/57698215/how-to-parse-deeply-nested-json-to-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!