问题
Below is the code that parses the following nested jsons to corresponding pandas dataframe :
import pandas as pd
def flatten_json(nested_json):
"""
Flatten json object with nested keys into a single level.
Args:
nested_json: A nested json object.
Returns:
The flattened json object if successful, None otherwise.
"""
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(nested_json)
return out
simplejson = False
if(isinstance(sample_object2, list)):
dict_flattened = [flatten_json(d) for d in sample_object2]
elif isinstance(sample_object2, dict):
while isinstance(sample_object2, dict) & simplejson == False:
for key in sample_object2.keys():
nodekey = key
if isinstance(sample_object2[nodekey], dict) | isinstance(sample_object2[nodekey], list):
dict_flattened = [flatten_json(d) for d in sample_object2[nodekey]]
sample_object2 = sample_object2[nodekey]
break
else:
dict_flattened = flatten_json(sample_object2)
simplejson = True
break
break
else:
print("Invalid json")
if simplejson:
pdf = pd.DataFrame(dict_flattened, index=[0])
else:
pdf = pd.DataFrame(dict_flattened)
Input 1 :
sample_object2 = {
"node":[
{
"item_1":"value_11",
"item_2":"value_12",
"item_3":"value_13",
"item_4":["sub_value_14", "sub_value_15"],
"item_5":{
"sub_item_1":"sub_item_value_11",
"sub_item_2":["sub_item_value_12", "sub_item_value_13"]
}
},
{
"item_1":"value_21",
"item_2":"value_22",
"item_4":["sub_value_24", "sub_value_25"],
"item_5":{
"sub_item_1":"sub_item_value_21",
"sub_item_2":["sub_item_value_22", "sub_item_value_23"]
}
}
]
}
Output 1:
+--------+--------+--------+------------+------------+-----------------+-------------------+-------------------+
|item_1 |item_2 |item_3 |item_4_0 |item_4_1 |item_5_sub_item_1|item_5_sub_item_2_0|item_5_sub_item_2_1|
+--------+--------+--------+------------+------------+-----------------+-------------------+-------------------+
|value_11|value_12|value_13|sub_value_14|sub_value_15|sub_item_value_11|sub_item_value_12 |sub_item_value_13 |
|value_21|value_22|nan |sub_value_24|sub_value_25|sub_item_value_21|sub_item_value_22 |sub_item_value_23 |
+--------+--------+--------+------------+------------+-----------------+-------------------+-------------------+
Input 2 :
sample_object2 = {
"item_1":"value_11",
"item_2":"value_12",
"item_5":{
"sub_item_1":"sub_item_value_11",
"sub_item_2":["sub_item_value_12", "sub_item_value_13"]
}
}
Output 2:
+--------+--------+-----------------+-------------------+-------------------+
|item_1 |item_2 |item_5_sub_item_1|item_5_sub_item_2_0|item_5_sub_item_2_1|
+--------+--------+-----------------+-------------------+-------------------+
|value_11|value_12|sub_item_value_11|sub_item_value_12 |sub_item_value_13 |
+--------+--------+-----------------+-------------------+-------------------+
Input 3 :
sample_object2 = {
"id": "0001",
"type": "donut",
"name": "Cake",
"image":
{
"url": "images/0001.jpg",
"width": 200,
"height": 200
},
"thumbnail":
{
"url": "images/thumbnails/0001.jpg",
"width": 32,
"height": 32
}
}
Output 3:
+----+-----+----+---------------+-----------+------------+--------------------------+---------------+----------------+
|id |type |name|image_url |image_width|image_height|thumbnail_url |thumbnail_width|thumbnail_height|
+----+-----+----+---------------+-----------+------------+--------------------------+---------------+----------------+
|0001|donut|Cake|images/0001.jpg|200 |200 |images/thumbnails/0001.jpg|32 |32 |
+----+-----+----+---------------+-----------+------------+--------------------------+---------------+----------------+
It works as expected for the above nested jsons. But for deeply nested jsons, the code doesnt work.
Input :
sample_object2 = {
"id": "0001",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batters":
{
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
]
},
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5005", "type": "Sugar" },
{ "id": "5007", "type": "Powdered Sugar" },
{ "id": "5006", "type": "Chocolate with Sprinkles" },
{ "id": "5003", "type": "Chocolate" },
{ "id": "5004", "type": "Maple" }
]
}
Expected Output:
+----+-----+----+----+-----------------+-------------------+----------+------------------------+
|id |type |name|ppu |batters_batter_id|batters_batter_type|topping_id|topping_type |
+----+-----+----+----+-----------------+-------------------+----------+------------------------+
|0001|donut|cake|0.55|1001 |Regular |5001 |None |
|nan |nan |nan |nan |1002 |Chocolate |5002 |Glazed |
|nan |nan |nan |nan |1003 |Blueberry |5005 |Sugar |
|nan |nan |nan |nan |1004 |Devil's Food |5007 |Powdered Sugar |
|nan |nan |nan |nan |nan |nan |5006 |Chocolate with Sprinkles|
|nan |nan |nan |nan |nan |nan |5003 |Chocolate |
|nan |nan |nan |nan |nan |nan |5004 |Maple |
+----+-----+----+----+-----------------+-------------------+----------+------------------------+
But Output was:
+----+-----+----+----+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+------------+--------------+------------+--------------+------------+--------------+------------+--------------+------------+------------------------+------------+--------------+------------+--------------+
|id |type |name|ppu |batters_batter_0_id|batters_batter_0_type|batters_batter_1_id|batters_batter_1_type|batters_batter_2_id|batters_batter_2_type|batters_batter_3_id|batters_batter_3_type|topping_0_id|topping_0_type|topping_1_id|topping_1_type|topping_2_id|topping_2_type|topping_3_id|topping_3_type|topping_4_id|topping_4_type |topping_5_id|topping_5_type|topping_6_id|topping_6_type|
+----+-----+----+----+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+------------+--------------+------------+--------------+------------+--------------+------------+--------------+------------+------------------------+------------+--------------+------------+--------------+
|0001|donut|Cake|0.55|1001 |Regular |1002 |Chocolate |1003 |Blueberry |1004 |Devil's Food |5001 |None |5002 |Glazed |5005 |Sugar |5007 |Powdered Sugar|5006 |Chocolate with Sprinkles|5003 |Chocolate |5004 |Maple |
+----+-----+----+----+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+-------------------+---------------------+------------+--------------+------------+--------------+------------+--------------+------------+--------------+------------+------------------------+------------+--------------+------------+--------------+
How to write a generic code that works for all kinds/levels of nested json? I tried tweaking the above code but couldn't do it. Any solution to this would be highly appreciated.
来源:https://stackoverflow.com/questions/57698215/how-to-parse-deeply-nested-json-to-pandas-dataframe