Multi-dimensional/Nested DataFrame/Dataset/Panel in Pandas

前端 未结 2 699
星月不相逢
星月不相逢 2021-01-14 11:42

I would like to store some multidimensional data in a pandas dataframe or panel such that I would like to be able to return for example:

  1. All the times for Run
相关标签:
2条回答
  • 2021-01-14 12:19

    I think you can use Multiindex and then select data by slicers:

    import pandas as pd
    
    df = pd.DataFrame({'Time': {('Runner A', 'Male', 35, 'Race A', 2014): '2:47:34', ('Runner C', 'Female', 32, 'Race B', 1998): '1:29:43', ('Runner B', 'Male', 29, 'Race A', 2015): '3:05:56', ('Runner A', 'Male', 35, 'Race A', 2013): '2:50:12', ('Runner A', 'Male', 35, 'Race B', 2013): '1:32:07', ('Runner A', 'Male', 35, 'Race A', 2015): '2:35:09'}})
    print (df)
                                       Time
    Runner A Male   35 Race A 2013  2:50:12
                              2014  2:47:34
                              2015  2:35:09
                       Race B 2013  1:32:07
    Runner B Male   29 Race A 2015  3:05:56
    Runner C Female 32 Race B 1998  1:29:43
    
    #index has to be fully lexsorted
    df.sort_index(inplace=True)
    print (df)
                                       Time
    Runner A Male   35 Race A 2013  2:50:12
                              2014  2:47:34
                              2015  2:35:09
                       Race B 2013  1:32:07
    Runner B Male   29 Race A 2015  3:05:56
    Runner C Female 32 Race B 1998  1:29:43
    
    idx = pd.IndexSlice
    print (df.loc[idx['Runner A',:,:,'Race A',:],:])
                                     Time
    Runner A Male 35 Race A 2013  2:50:12
                            2014  2:47:34
                            2015  2:35:09
    
    print (df.loc[idx[:,:,:,'Race A',2015],:])
                                     Time
    Runner A Male 35 Race A 2015  2:35:09
    Runner B Male 29 Race A 2015  3:05:56
    
    0 讨论(0)
  • 2021-01-14 12:19

    Simple approach:

    runners = pd.DataFrame(
        [
            ['Bob',   'Male', 1980],
            ['Tom',   'Male', 1986],
            ['Amy', 'Female', 1966],
        ],
        columns=['Name', 'Gender', 'BirthYear']
    )
    
    races = pd.DataFrame(
        [
            ['A', 2015, 'Bob', '2:35:09'],
            ['A', 2014, 'Bob', '2:47:34'],
            ['A', 2013, 'Bob', '2:50:12'],
            ['B', 2013, 'Bob', '1:32:07'],
            ['A', 2015, 'Tom', '3:05:56'],
            ['B', 1998, 'Amy', '1:29:43'],
        ],
        columns=['Race', 'Year', 'Name', 'Time']
    )
    
    
    print races.loc[(races.Name == 'Bob') & (races.Race == 'A')][['Time']]
    print 
    print races.loc[(races.Year == 2015) & (races.Race == 'A')][['Name', 'Time']]
    
          Time
    0  2:35:09
    1  2:47:34
    2  2:50:12
    
      Name     Time
    0  Bob  2:35:09
    4  Tom  3:05:56
    

    get back all data

    df = races.merge(runners)
    

    get age at race time

    df['Age'] = df.Year - df.BirthYear
    print df
    
      Race  Year Name     Time  Gender  BirthYear  Age
    0    A  2015  Bob  2:35:09    Male       1980   35
    1    A  2014  Bob  2:47:34    Male       1980   34
    2    A  2013  Bob  2:50:12    Male       1980   33
    3    B  2013  Bob  1:32:07    Male       1980   33
    4    A  2015  Tom  3:05:56    Male       1986   29
    5    B  1998  Amy  1:29:43  Female       1966   32
    
    0 讨论(0)
提交回复
热议问题