boolean indexing that can produce a view to a large pandas dataframe?

前端 未结 3 1034
暖寄归人
暖寄归人 2021-02-04 08:52

Got a large dataframe that I want to take slices of (according to multiple boolean criteria), and then modify the entries in those slices in order to change the original datafra

相关标签:
3条回答
  • 2021-02-04 09:26

    Even though df.loc[idx] may be a copy of a portion of df, assignment to df.loc[idx] modifies df itself. (This is also true of df.iloc and df.ix.)

    For example,

    import pandas as pd
    import numpy as np
    df = pd.DataFrame({'A':[9,10]*6,
                       'B':range(23,35),
                       'C':range(-6,6)})
    
    print(df)
    #      A   B  C
    # 0    9  23 -6
    # 1   10  24 -5
    # 2    9  25 -4
    # 3   10  26 -3
    # 4    9  27 -2
    # 5   10  28 -1
    # 6    9  29  0
    # 7   10  30  1
    # 8    9  31  2
    # 9   10  32  3
    # 10   9  33  4
    # 11  10  34  5
    

    Here is our boolean index:

    idx = (df['C']!=0) & (df['A']==10) & (df['B']<30)
    

    We can modify those rows of df where idx is True by assigning to df.loc[idx, ...]. For example,

    df.loc[idx, 'A'] += df.loc[idx, 'B'] * df.loc[idx, 'C']
    print(df)
    

    yields

          A   B  C
    0     9  23 -6
    1  -110  24 -5
    2     9  25 -4
    3   -68  26 -3
    4     9  27 -2
    5   -18  28 -1
    6     9  29  0
    7    10  30  1
    8     9  31  2
    9    10  32  3
    10    9  33  4
    11   10  34  5
    
    0 讨论(0)
  • Building off of unutbu's example you could also use the boolean index on df.index like so:

    In [11]: df.ix[df.index[idx]] = 999
    
    In [12]: df
    Out[12]:
          A    B    C
    0     9   23   -6
    1   999  999  999
    2     9   25   -4
    3   999  999  999
    4     9   27   -2
    5   999  999  999
    6     9   29    0
    7    10   30    1
    8     9   31    2
    9    10   32    3
    10    9   33    4
    11   10   34    5
    
    0 讨论(0)
  • 2021-02-04 09:34

    The pandas docs have a section on Returning a view versus a copy:

    The rules about when a view on the data is returned are entirely dependent on NumPy. Whenever an array of labels or a boolean vector are involved in the indexing operation, the result will be a copy. With single label / scalar indexing and slicing, e.g. df.ix[3:6] or df.ix[:, 'A'], a view will be returned.

    0 讨论(0)
提交回复
热议问题