Pandas: Why are double brackets needed to select column after boolean indexing

前端 未结 4 1122
南笙
南笙 2021-01-29 20:08

For a df table like below,

   A B C D
0  0 1 1 1
1  2 3 5 7
3  3 1 2 8

why are the double brackets needed for selecting specific columns after

相关标签:
4条回答
  • 2021-01-29 20:53

    For pandas objects (Series, DataFrame), the indexing operator [] only accepts

    1. colname or list of colnames to select column(s)
    2. slicing or Boolean array to select row(s), i.e. it only refers to one dimension of the dataframe.

    For df[[colname(s)]], the interior brackets are for list, and the outside brackets are indexing operator, i.e. you must use double brackets if you select two or more columns. With one column name, single pair of brackets returns a Series, while double brackets return a dataframe.

    Also, df.ix[df['A'] < 3,['A','C']] or df.loc[df['A'] < 3,['A','C']] is better than the chained selection for avoiding returning a copy versus a view of the dataframe.

    Please refer pandas documentation for details

    0 讨论(0)
  • 2021-01-29 20:57

    Adding to previous responses, you could also use df.iloc accessor if you need to select index positions. It's also making the code more reproducible, which is nice.

    0 讨论(0)
  • 2021-01-29 21:02

    Because inner brackets are just python syntax (literal) for list.

    The outer brackets are the indexer operation of pandas dataframe object.

    In this use case inner ['A', 'B'] defines the list of columns to pass as single argument to the indexer operation, which is denoted by outer brackets.

    0 讨论(0)
  • 2021-01-29 21:07

    Because you have no columns named 'A','C', which is what you'd be trying to do which will raise a KeyError, so you have to use an iterable to sub-select from the df.

    So

    df[df['A'] < 3]['A','C']
    

    raises

    KeyError: ('A', 'C')

    Which is different to

    In [261]:
    df[df['A'] < 3][['A','C']]
    
    Out[261]:
       A  C
    0  0  1
    1  2  5
    

    This is no different to trying:

    df['A','C']
    

    hence why you need double square brackets:

    df[['A','C']]
    

    Note that the modern way is to use .ix:

    In [264]:
    df.ix[df['A'] < 3,['A','C']]
    
    Out[264]:
       A  C
    0  0  1
    1  2  5
    

    So that you're operating on a view rather than potentially a copy

    0 讨论(0)
提交回复
热议问题