问题
I have a DataFrame with the following columns and no duplicates:
['region', 'type', 'name', 'value']
that can be seen as a hierarchy as follows
grouped = df.groupby(['region','type', 'name'])
I would like to serialize this hierarchy as a JSON object.
If anyone is interested, the motivation behind this is to eventually put together a visualization like this one which requires a JSON
file.
To do so, I need to convert grouped
into the following:
new_data['children'][i]['name'] = region
new_data['children'][i]['children'][j]['name'] = type
new_data['children'][i]['children'][j]'children'][k]['name'] = name
new_data['children'][i]['children'][j]'children'][k]['size'] = value
...
where region
, type
, name
correspond to different levels of the hierarchy (indexed by i
, j
and k
)
Is there an easy way in Pandas/Python to do this?
回答1:
Something along these lines might get you there.
from collections import defaultdict
tree = lambda: defaultdict(tree) # a recursive defaultdict
d = tree()
for _, (region, type, name, value) in df.iterrows():
d['children'][region]['name'] = region
...
json.dumps(d)
A vectorized solution would be better, and maybe something that takes advantage of the speed of groupby, but I can't think of such a solution.
Also take a look at df.groupby(...).groups
, which return a dict.
See also this answer.
回答2:
Here's another script to take a pandas df and output a flare.json file: https://github.com/andrewheekin/csv2flare.json
来源:https://stackoverflow.com/questions/23531145/pandas-to-d3-serializing-dataframes-to-json