问题
I have a times series with some missing entries, that looks like this:
date value
---------------
2000 5
2001 10
2003 8
2004 72
2005 12
2007 13
I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:
date value previous_value
-------------------------------
2000 5 nan
2001 10 5
2003 8 nan
2004 72 8
2005 12 72
2007 13 nan
However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.
What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).
回答1:
In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
'value':[5,10,8,72,12,13] })
In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]
In [590]: df
Out[590]:
date value previous_value
0 2000 5 NaN
1 2001 10 5
2 2003 8 NaN
3 2004 72 8
4 2005 12 72
5 2007 13 NaN
Also see here for a time series approach using resample()
: Using shift() with unevenly spaced data
来源:https://stackoverflow.com/questions/28997367/pandas-shift-time-series-with-missing-values