问题
I have a json object like
{
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
i want to flatten it so desired output looks like
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 28310659235920,"props.title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00", "props.updated_at": "2019-05-22T01:03:29+07:00"}
{"id": 3590403096656,"title": "Romania Special Zip Hoodie Blue - Version 02 A5","tags": ["1ST THE WORLD FOR YOU <3","apparel"],"props.id": 444444444444,"props.title": "number 2","props.position": 1,"props.product_id": 3590403096656,"props.created_at": "2019-05-22T00:46:19+07:00","props.updated_at": "2019-05-22T01:03:29+07:00"}
so far i have tried:
from pandas.io.json import json_normalize
json_normalize(sample_object)
where sample_object
contains json
object, i am looping through a large file of such objects which i want to flatten in desired format.
json_normalize
is not giving me desired output, i want to keep tags as it is but flatten props
and repeat parent object info.
回答1:
please try this:
import copy
obj = {
"id": 3590403096656,
"title": "Romania Special Zip Hoodie Blue - Version 02 A5",
"tags": [
"1ST THE WORLD FOR YOU <3",
"apparel",
],
"props": [
{
"id": 28310659235920,
"title": "S / romainia All Over Print Full Zip Hoodie for Men (Model H14)",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
},
{
"id": 444444444444,
"title": "number 2",
"position": 1,
"product_id": 3590403096656,
"created_at": "2019-05-22T00:46:19+07:00",
"updated_at": "2019-05-22T01:03:29+07:00"
}
]
}
props = obj.pop("props")
for p in props:
res = copy.deepcopy(obj)
for k in p:
res["props."+k] = p[k]
print(res)
basically it use pop("props")
to get the obj without "props"
(which is the common part to use in all result objects),
then we iterate through props, and create new objects that contain the base object, and then fill "props.key" for every key in every prop.
回答2:
You want some json_normalize
behavior, but with a custom twist. So use json_normalize
or similar on a portion of the data, then combine it with the remainder of data.
The code below prefers the "or similar" route, reaching deep into the pandas codebase to get the nested_to_record
helper function, which flattens dictionaries. It's used to create individual rows that combine the base data (keys/values common across all properties) with the flattened data specific to each props entry. There is a commented-out line that does the equivalent thing without nested_to_record
, but it somewhat inelegantly flattens into a DataFrame
, then exports out to a dict
.
from collections import OrderedDict
import json
import pandas as pd
from pandas.io.json.normalize import nested_to_record
data = json.loads(rawjson)
props = data.pop('props')
rows = []
for prop in props:
rowdict = OrderedDict(data)
flattened_prop = nested_to_record({'props': prop})
# flatteded_prop = json_normalize({'props': prop}).to_dict(orient='records')[0]
rowdict.update(flattened_prop)
rows.append(rowdict)
df = pd.DataFrame(rows)
Resulting in:
来源:https://stackoverflow.com/questions/56716636/de-normalize-json-object-into-flat-objects