How to generate n-level hierarchical JSON from pandas DataFrame?

后端 未结 2 494
小鲜肉
小鲜肉 2020-12-05 08:23

Is there an efficient way to create hierarchical JSON (n-levels deep) where the parent values are the keys and not the variable label? i.e:

{\"2017-12-31\":
         


        
相关标签:
2条回答
  • 2020-12-05 08:56

    You can use itertuples to generate a nested dict, and then dump to json. To do this, you need to change the date timestamp to string

    df4=df3.stack(level=[0,1,2]).reset_index() 
    df4['Date'] = df4['Date'].dt.strftime('%Y-%m-%d')
    df4 = df4.set_index(['Date','Job Role','Department','Team']) \
        .sort_index()
    

    create the nested dict

    def nested_dict():
        return collections.defaultdict(nested_dict)
    result = nested_dict()
    

    Use itertuples to populate it

    for row in df4.itertuples():
        result[row.Index[0]][row.Index[1]][row.Index[2]][row.Index[3]]['sales'] = row._1
        # print(row)
    

    and then use the json module to dump it.

    import json
    json.dumps(result)
    

    '{"2017-12-31": {"Junior": {"Electronics": {"A": {"sales": -0.3947134370101142}, "B": {"sales": -0.9873530754403204}, "C": {"sales": -1.1182598058984508}}, "Household": {"A": {"sales": -1.1211850078098677}, "B": {"sales": 2.0330914483907847}, "C": {"sales": 3.94762379718749}}}, "Senior": {"Electronics": {"A": {"sales": 1.4528493451404196}, "B": {"sales": -2.3277322345261005}, "C": {"sales": -2.8040263791743922}}, "Household": {"A": {"sales": 3.0972591929279663}, "B": {"sales": 9.884565742502392}, "C": {"sales": 2.9359830722457576}}}}, "2018-01-31": {"Junior": {"Electronics": {"A": {"sales": -1.3580300149125217}, "B": {"sales": 1.414665000013205}, "C": {"sales": -1.432795129108244}}, "Household": {"A": {"sales": 2.7783259569115346}, "B": {"sales": 2.717700275321333}, "C": {"sales": 1.4358377416259644}}}, "Senior": {"Electronics": {"A": {"sales": 2.8981726774941485}, "B": {"sales": 12.022897003654117}, "C": {"sales": 0.01776855733076088}}, "Household": {"A": {"sales": -3.342163776613092}, "B": {"sales": -5.283208386572307}, "C": {"sales": 2.942580121975619}}}}}'

    0 讨论(0)
  • 2020-12-05 08:56

    I ran into this and was confused by the complexity of the OP's setup. Here is a minimal example and solution (based on the answer provided by @Maarten Fabré).

    import collections
    import pandas as pd
    
    # build init DF
    x = ['a', 'a']
    y = ['b', 'c']
    z = [['d'], ['e', 'f']]
    df = pd.DataFrame(list(zip(x, y, z)), columns=['x', 'y', 'z'])
    
    #    x  y       z
    # 0  a  b     [d]
    # 1  a  c  [e, f]
    

    Set up the the regular, flat, index, and then make that a multi index

    # set flat index
    df = df.set_index(['x', 'y'])
    
    # set up multi index
    df = df.reindex(pd.MultiIndex.from_tuples(zip(x, y)))      
    
    #           z
    # a b     [d]
    #   c  [e, f]
    

    Then init a nested dictionary, and fill it out item-by-item

    nested_dict = collections.defaultdict(dict)
    
    for keys, value in df.z.iteritems():
        nested_dict[keys[0]][keys[1]] = value
    
    # defaultdict(dict, {'a': {'b': ['d'], 'c': ['e', 'f']}})
    

    At this point you can JSON dump it, etc.

    0 讨论(0)
提交回复
热议问题