pandas shift time series with missing values

问题

I have a times series with some missing entries, that looks like this:

date     value
---------------
2000       5
2001      10
2003      8
2004      72
2005      12
2007      13

I would like to do create a column for the "previous_value". But I only want it to show values for consecutive years. So I want it to look like this:

date     value    previous_value
-------------------------------
2000       5        nan
2001      10         5
2003      8         nan
2004      72         8
2005      12        72
2007      13        nan

However just applying pandas shift function directly to the column 'value' would give 'previous_value' = 10 for 'time' = 2003, and 'previous_value' = 12 for 'time' = 2007.

What's the most elegant way to deal with this in pandas? (I'm not sure if it's as easy as setting the 'freq' attribute).

回答1:

In [588]: df = pd.DataFrame({ 'date':[2000,2001,2003,2004,2005,2007],
                              'value':[5,10,8,72,12,13] })

In [589]: df['previous_value'] = df.value.shift()[ df.date == df.date.shift() + 1 ]

In [590]: df
Out[590]: 
   date  value  previous_value
0  2000      5             NaN
1  2001     10               5
2  2003      8             NaN
3  2004     72               8
4  2005     12              72
5  2007     13             NaN

Also see here for a time series approach using resample(): Using shift() with unevenly spaced data

来源：https://stackoverflow.com/questions/28997367/pandas-shift-time-series-with-missing-values

标签

python

pandas

time-series

shift

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!