selecting from multi-index pandas

后端 未结 6 456
庸人自扰
庸人自扰 2020-12-02 05:19

I have a multi-index data frame with columns \'A\' and \'B\'.

Is there is a way to select rows by filtering on one column of the multi-index without resetting the

相关标签:
6条回答
  • 2020-12-02 05:29

    You can use DataFrame.xs():

    In [36]: df = DataFrame(np.random.randn(10, 4))
    
    In [37]: df.columns = [np.random.choice(['a', 'b'], size=4).tolist(), np.random.choice(['c', 'd'], size=4)]
    
    In [38]: df.columns.names = ['A', 'B']
    
    In [39]: df
    Out[39]:
    A      b             a
    B      d      d      d      d
    0 -1.406  0.548 -0.635  0.576
    1 -0.212 -0.583  1.012 -1.377
    2  0.951 -0.349 -0.477 -1.230
    3  0.451 -0.168  0.949  0.545
    4 -0.362 -0.855  1.676 -2.881
    5  1.283  1.027  0.085 -1.282
    6  0.583 -1.406  0.327 -0.146
    7 -0.518 -0.480  0.139  0.851
    8 -0.030 -0.630 -1.534  0.534
    9  0.246 -1.558 -1.885 -1.543
    
    In [40]: df.xs('a', level='A', axis=1)
    Out[40]:
    B      d      d
    0 -0.635  0.576
    1  1.012 -1.377
    2 -0.477 -1.230
    3  0.949  0.545
    4  1.676 -2.881
    5  0.085 -1.282
    6  0.327 -0.146
    7  0.139  0.851
    8 -1.534  0.534
    9 -1.885 -1.543
    

    If you want to keep the A level (the drop_level keyword argument is only available starting from v0.13.0):

    In [42]: df.xs('a', level='A', axis=1, drop_level=False)
    Out[42]:
    A      a
    B      d      d
    0 -0.635  0.576
    1  1.012 -1.377
    2 -0.477 -1.230
    3  0.949  0.545
    4  1.676 -2.881
    5  0.085 -1.282
    6  0.327 -0.146
    7  0.139  0.851
    8 -1.534  0.534
    9 -1.885 -1.543
    
    0 讨论(0)
  • 2020-12-02 05:31

    You can also use query which is very readable in my opinion and straightforward to use:

    import pandas as pd
    
    df = pd.DataFrame({'A': [1, 2, 3, 4], 'B': [10, 20, 50, 80], 'C': [6, 7, 8, 9]})
    df = df.set_index(['A', 'B'])
    
          C
    A B    
    1 10  6
    2 20  7
    3 50  8
    4 80  9
    

    For what you had in mind you can now simply do:

    df.query('A == 1')
    
          C
    A B    
    1 10  6
    

    You can also have more complex queries using and

    df.query('A >= 1 and B >= 50')
    
          C
    A B    
    3 50  8
    4 80  9
    

    and or

    df.query('A == 1 or B >= 50')
    
          C
    A B    
    1 10  6
    3 50  8
    4 80  9
    

    You can also query on different index levels, e.g.

    df.query('A == 1 or C >= 8')
    

    will return

          C
    A B    
    1 10  6
    3 50  8
    4 80  9
    

    If you want to use variables inside your query, you can use @:

    b_threshold = 20
    c_threshold = 8
    
    df.query('B >= @b_threshold and C <= @c_threshold')
    
          C
    A B    
    2 20  7
    3 50  8
    
    0 讨论(0)
  • 2020-12-02 05:34

    You can use DataFrame.loc:

    >>> df.loc[1]
    

    Example

    >>> print(df)
           result
    A B C        
    1 1 1       6
        2       9
      2 1       8
        2      11
    2 1 1       7
        2      10
      2 1       9
        2      12
    
    >>> print(df.loc[1])
         result
    B C        
    1 1       6
      2       9
    2 1       8
      2      11
    
    >>> print(df.loc[2, 1])
       result
    C        
    1       7
    2      10
    
    0 讨论(0)
  • 2020-12-02 05:38

    Understanding how to access multi-indexed pandas DataFrame can help you with all kinds of task like that.

    Copy paste this in your code to generate example:

    # hierarchical indices and columns
    index = pd.MultiIndex.from_product([[2013, 2014], [1, 2]],
                                       names=['year', 'visit'])
    columns = pd.MultiIndex.from_product([['Bob', 'Guido', 'Sue'], ['HR', 'Temp']],
                                         names=['subject', 'type'])
    
    # mock some data
    data = np.round(np.random.randn(4, 6), 1)
    data[:, ::2] *= 10
    data += 37
    
    # create the DataFrame
    health_data = pd.DataFrame(data, index=index, columns=columns)
    health_data
    

    Will give you table like this:

    Standard access by column

    health_data['Bob']
    type       HR   Temp
    year visit      
    2013    1   22.0    38.6
            2   52.0    38.3
    2014    1   30.0    38.9
            2   31.0    37.3
    
    
    health_data['Bob']['HR']
    year  visit
    2013  1        22.0
          2        52.0
    2014  1        30.0
          2        31.0
    Name: HR, dtype: float64
    
    # filtering by column/subcolumn - your case:
    health_data['Bob']['HR']==22
    year  visit
    2013  1         True
          2        False
    2014  1        False
          2        False
    
    health_data['Bob']['HR'][2013]    
    visit
    1    22.0
    2    52.0
    Name: HR, dtype: float64
    
    health_data['Bob']['HR'][2013][1]
    22.0
    

    Access by row

    health_data.loc[2013]
    subject Bob Guido   Sue
    type    HR  Temp    HR  Temp    HR  Temp
    visit                       
    1   22.0    38.6    40.0    38.9    53.0    37.5
    2   52.0    38.3    42.0    34.6    30.0    37.7
    
    health_data.loc[2013,1] 
    subject  type
    Bob      HR      22.0
             Temp    38.6
    Guido    HR      40.0
             Temp    38.9
    Sue      HR      53.0
             Temp    37.5
    Name: (2013, 1), dtype: float64
    
    health_data.loc[2013,1]['Bob']
    type
    HR      22.0
    Temp    38.6
    Name: (2013, 1), dtype: float64
    
    health_data.loc[2013,1]['Bob']['HR']
    22.0
    

    Slicing multi-index

    idx=pd.IndexSlice
    health_data.loc[idx[:,1], idx[:,'HR']]
        subject Bob Guido   Sue
    type    HR  HR  HR
    year    visit           
    2013    1   22.0    40.0    53.0
    2014    1   30.0    52.0    45.0
    
    0 讨论(0)
  • 2020-12-02 05:45

    Another option is:

    filter1 = df.index.get_level_values('A') == 1
    filter2 = df.index.get_level_values('B') == 4
    
    df.iloc[filter1 & filter2]
    Out[11]:
         0
    A B
    1 4  1
    
    0 讨论(0)
  • 2020-12-02 05:49

    One way is to use the get_level_values Index method:

    In [11]: df
    Out[11]:
         0
    A B
    1 4  1
    2 5  2
    3 6  3
    
    In [12]: df.iloc[df.index.get_level_values('A') == 1]
    Out[12]:
         0
    A B
    1 4  1
    

    In 0.13 you'll be able to use xs with drop_level argument:

    df.xs(1, level='A', drop_level=False) # axis=1 if columns
    

    Note: if this were column MultiIndex rather than index, you could use the same technique:

    In [21]: df1 = df.T
    
    In [22]: df1.iloc[:, df1.columns.get_level_values('A') == 1]
    Out[22]:
    A  1
    B  4
    0  1
    
    0 讨论(0)
提交回复
热议问题