Create a Pandas DataFrame from deeply nested JSON

前端 未结 1 1494
一向
一向 2020-12-03 08:26

I\'m trying to create a single Pandas DataFrame object from a deeply nested JSON string.

The JSON schema is:

{\"intervals\": [
{
pivots: \"Jane Smit         


        
相关标签:
1条回答
  • 2020-12-03 08:46

    I think organizing your data in way that yields repeating column names is only going to create headaches for you later on down the road. A better approach IMHO is to create a column for each of pivots, interval_id, and p_value. This will make extremely easy to query your data after loading it into pandas.

    Also, your JSON has some errors in it. I ran it through this to find the errors.

    jq helps here

    import sh
    jq = sh.jq.bake('-M')  # disable colorizing
    json_data = "from above"
    rule = """[{pivots: .intervals[].pivots, 
                interval_id: .intervals[].series[].interval_id,
                p_value: .intervals[].series[].p_value}]"""
    out = jq(rule, _in=json_data).stdout
    res = pd.DataFrame(json.loads(out))
    

    This will yield output similar to

        interval_id       p_value      pivots
    32            2  2.867501e-06  Jane Smith
    33            2  1.000000e+00  Jane Smith
    34            2  1.116279e-08  Jane Smith
    35            2  2.867501e-06  Jane Smith
    36            0  1.000000e+00   Bob Smith
    37            0  1.116279e-08   Bob Smith
    38            0  2.867501e-06   Bob Smith
    39            0  1.000000e+00   Bob Smith
    40            0  1.116279e-08   Bob Smith
    41            0  2.867501e-06   Bob Smith
    42            1  1.000000e+00   Bob Smith
    43            1  1.116279e-08   Bob Smith
    

    Adapted from this comment

    Of course, you can always call res.drop_duplicates() to remove the duplicate rows. This gives

    In [175]: res.drop_duplicates()
    Out[175]:
        interval_id       p_value      pivots
    0             0  1.000000e+00  Jane Smith
    1             0  1.116279e-08  Jane Smith
    2             0  2.867501e-06  Jane Smith
    6             1  1.000000e+00  Jane Smith
    7             1  1.116279e-08  Jane Smith
    8             1  2.867501e-06  Jane Smith
    12            2  1.000000e+00  Jane Smith
    13            2  1.116279e-08  Jane Smith
    14            2  2.867501e-06  Jane Smith
    36            0  1.000000e+00   Bob Smith
    37            0  1.116279e-08   Bob Smith
    38            0  2.867501e-06   Bob Smith
    42            1  1.000000e+00   Bob Smith
    43            1  1.116279e-08   Bob Smith
    44            1  2.867501e-06   Bob Smith
    48            2  1.000000e+00   Bob Smith
    49            2  1.116279e-08   Bob Smith
    50            2  2.867501e-06   Bob Smith
    
    [18 rows x 3 columns]
    
    0 讨论(0)
提交回复
热议问题