Pandas: Slice Dataframe by Datetime (that may not exist) and Return View

前端未结

关注

 1  1857

I have a large DataFrame which I would like to slice so that I can perform some calculations on the sliced dataframe so that the values are updated in the original. In addit

相关标签:

1条回答

情书的邮戳

2021-01-19 18:45

One way is to use loc and wrap your conditions in parentheses and use the bitwise oerator &, the bitwise operator is required as you are comparing an array of values and not a single value, the parentheses are required due to operator precedence. We can then use this to perform label selection using loc and set the 'C' column like so:

In [15]:

import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
df.loc[(df.index > start) & (df.index < end), 'C'] = 100
df
Out[15]:
                            A         B    C
TIME                                        
2014-01-02 14:00:00 -1.172285  1.706200  NaN
2014-01-02 14:05:00  0.039511 -0.320798  NaN
2014-01-02 14:10:00 -0.192179 -0.539397  100
2014-01-02 14:15:00 -0.475917 -0.280055  100
2014-01-02 14:20:00  0.163376  1.124602  100
2014-01-02 14:25:00 -2.477812  0.656750  NaN

If we look at each method you tried and why they didn't work:

sdf = df[start:end] #  will raise KeyError if start and end are not present in index
sdf = df[start < df.index < end] #  will raise ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all(), this is because you are comparing arrays of values not a single scalar value
sdf = df.ix[start:end] # raises KeyError same as first example
sdf = df.loc[start:end] #  raises KeyError same as first example
sdf = df.truncate(before=start, after=end, copy=False) # generates correct result but operations on this will raise SettingWithCopyWarning as you've found

EDIT

You can set sdf to the mask and use this with loc to set your 'C' column:

In [7]:

import datetime as dt
start = dt.datetime.strptime('2014-01-02 14:07:00', '%Y-%m-%d %H:%M:%S')
end = dt.datetime.strptime('2014-01-02 14:22:00', '%Y-%m-%d %H:%M:%S')
sdf = (df.index > start) & (df.index < end)
df.loc[sdf,'C'] = 100
df
Out[7]:
                            A         B    C
TIME                                        
2014-01-02 14:00:00 -1.172285  1.706200  NaN
2014-01-02 14:05:00  0.039511 -0.320798  NaN
2014-01-02 14:10:00 -0.192179 -0.539397  100
2014-01-02 14:15:00 -0.475917 -0.280055  100
2014-01-02 14:20:00  0.163376  1.124602  100
2014-01-02 14:25:00 -2.477812  0.656750  NaN

0 讨论(0)