Pandas: Slice Dataframe by Datetime (that may not exist) and Return View

前端 未结 1 1856
日久生厌
日久生厌 2021-01-19 17:52

I have a large DataFrame which I would like to slice so that I can perform some calculations on the sliced dataframe so that the values are updated in the original. In addit

相关标签:
1条回答
  • 2021-01-19 18:45

    One way is to use loc and wrap your conditions in parentheses and use the bitwise oerator &, the bitwise operator is required as you are comparing an array of values and not a single value, the parentheses are required due to operator precedence. We can then use this to perform label selection using loc and set the 'C' column like so:

    In [15]:
    
    import datetime as dt
    start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
    end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
    df.loc[(df.index > start) & (df.index < end), 'C'] = 100
    df
    Out[15]:
                                A         B    C
    TIME                                        
    2014-01-02 14:00:00 -1.172285  1.706200  NaN
    2014-01-02 14:05:00  0.039511 -0.320798  NaN
    2014-01-02 14:10:00 -0.192179 -0.539397  100
    2014-01-02 14:15:00 -0.475917 -0.280055  100
    2014-01-02 14:20:00  0.163376  1.124602  100
    2014-01-02 14:25:00 -2.477812  0.656750  NaN
    

    If we look at each method you tried and why they didn't work:

    sdf = df[start:end] #  will raise KeyError if start and end are not present in index
    sdf = df[start < df.index < end] #  will raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), this is because you are comparing arrays of values not a single scalar value
    sdf = df.ix[start:end] # raises KeyError same as first example
    sdf = df.loc[start:end] #  raises KeyError same as first example
    sdf = df.truncate(before=start, after=end, copy=False) # generates correct result but operations on this will raise SettingWithCopyWarning as you've found
    

    EDIT

    You can set sdf to the mask and use this with loc to set your 'C' column:

    In [7]:
    
    import datetime as dt
    start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
    end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
    sdf = (df.index > start) & (df.index < end)
    df.loc[sdf,'C'] = 100
    df
    Out[7]:
                                A         B    C
    TIME                                        
    2014-01-02 14:00:00 -1.172285  1.706200  NaN
    2014-01-02 14:05:00  0.039511 -0.320798  NaN
    2014-01-02 14:10:00 -0.192179 -0.539397  100
    2014-01-02 14:15:00 -0.475917 -0.280055  100
    2014-01-02 14:20:00  0.163376  1.124602  100
    2014-01-02 14:25:00 -2.477812  0.656750  NaN
    
    0 讨论(0)
提交回复
热议问题