Dynamically accessing a pandas dataframe column

前端 未结 2 1814
日久生厌
日久生厌 2020-12-21 06:27

Consider this simple example

import pandas as pd

df = pd.DataFrame({\'one\' : [1,2,3],
                   \'two\' : [1,0,0]})

df 
Out[9]: 
   one  two
0            


        
相关标签:
2条回答
  • 2020-12-21 06:46

    You pass a string as the second argument. In effect, you're trying to do something like:

    df.'two'
    

    Which is invalid syntax. If you're trying to dynamically access a column, you'll need to use the index notation, [...] because the dot/attribute accessor notation doesn't work for dynamic access.


    Dynamic access on its own is possible. For example, you can use getattr (but I don't recommend this, it's an antipattern):

    In [674]: df
    Out[674]: 
       one  two
    0    1    1
    1    2    0
    2    3    0
    
    In [675]: getattr(df, 'one')
    Out[675]: 
    0    1
    1    2
    2    3
    Name: one, dtype: int64
    

    Dynamically selecting by attribute from a groupby call can be done, something like:

    In [677]: getattr(df.groupby('one'), mycol).sum() 
    Out[677]: 
    one
    1    1
    2    0
    3    0
    Name: two, dtype: int64
    

    But don't do it. It is a horrid anti pattern, and much more unreadable than df.groupby('one')[mycol].sum().

    0 讨论(0)
  • 2020-12-21 06:49

    I think you need [] for select column by column name what is general solution for selecting columns, because select by attributes have many exceptions:

    • You can use this access only if the index element is a valid python identifier, e.g. s.1 is not allowed. See here for an explanation of valid identifiers.
    • The attribute will not be available if it conflicts with an existing method name, e.g. s.min is not allowed.
    • Similarly, the attribute will not be available if it conflicts with any of the following list: index, major_axis, minor_axis, items, labels.
    • In any of these cases, standard indexing will still work, e.g. s['1'], s['min'], and s['index'] will access the corresponding element or column.
    def megabug(df,mycol):
        return df.groupby('one')[mycol].sum()
    
    print (megabug(df, 'two'))
    
    one
    1    1
    2    0
    3    0
    Name: two, dtype: int64
    
    0 讨论(0)
提交回复
热议问题