MultiIndex Group By in Pandas Data Frame

前端 未结 2 2018
野的像风
野的像风 2021-02-03 13:56

I have a data set that contains countries and statistics on economic indicators by year, organized like so:

Country  Metric           2011   2012   2013  2014
          


        
相关标签:
2条回答
  • 2021-02-03 14:22

    Is this what you are looking for:

    df = df.groupby(['Metric'])
    df.get_group('GDP')
    
       Country Metric  2011    2012    2013    2014
    0    USA     GDP     7      4       0       2
    2    GB      GDP     8      7       0       7
    4    FR      GDP     5      0       0       1
    
    0 讨论(0)
  • 2021-02-03 14:27

    In this case, you don't actually need a groupby. You also don't have a MultiIndex. You can make one like this:

    import pandas
    from io import StringIO
    
    datastring = StringIO("""\
    Country  Metric           2011   2012   2013  2014
    USA     GDP               7      4     0      2
    USA     Pop.              2      3     0      3
    GB      GDP               8      7     0      7
    GB      Pop.              2      6     0      0
    FR      GDP               5      0     0      1
    FR      Pop.              1      1     0      5
    """)
    data = pandas.read_table(datastring, sep='\s\s+')
    data.set_index(['Country', 'Metric'], inplace=True)
    

    Then data looks like this:

                    2011  2012  2013  2014
    Country Metric                        
    USA     GDP        7     4     0     2
            Pop.       2     3     0     3
    GB      GDP        8     7     0     7
            Pop.       2     6     0     0
    FR      GDP        5     0     0     1
            Pop.       1     1     0     5
    

    Now to get the GDPs, you can take a cross-section of the dataframe via the xs method:

    data.xs('GDP', level='Metric')
    
             2011  2012  2013  2014
    Country                        
    USA         7     4     0     2
    GB          8     7     0     7
    FR          5     0     0     1
    

    It's so easy because your data are already pivoted/unstacked. IF they weren't and looked like this:

    data.columns.names = ['Year']
    data = data.stack()
    data
    
    Country  Metric  Year
    USA      GDP     2011    7
                     2012    4
                     2013    0
                     2014    2
             Pop.    2011    2
                     2012    3
                     2013    0
                     2014    3
    GB       GDP     2011    8
                     2012    7
                     2013    0
                     2014    7
             Pop.    2011    2
                     2012    6
                     2013    0
                     2014    0
    FR       GDP     2011    5
                     2012    0
                     2013    0
                     2014    1
             Pop.    2011    1
                     2012    1
                     2013    0
                     2014    5
    

    You could then use groupby to tell you something about the world as a whole:

    data.groupby(level=['Metric', 'Year']).sum()
    Metric  Year
    GDP     2011    20
            2012    11
            2013     0
            2014    10
    Pop.    2011     5
            2012    10
            2013     0
            2014     8
    

    Or get real fancy:

    data.groupby(level=['Metric', 'Year']).sum().unstack(level='Metric')
    Metric  GDP  Pop.
    Year             
    2011     20     5
    2012     11    10
    2013      0     0
    2014     10     8
    
    0 讨论(0)
提交回复
热议问题