Pandas Time Series DataFrame Missing Values

[亡魂溺海] 提交于 2021-02-10 06:08:39

问题


I have a dataset of Total Sales from 2008-2015. I have an entry for each day, and so I have a created a pandas DataFrame with a DatetimeIndex and a column for sales. So it looks like this

The problem is that I am missing data for most of 2010. These missing values are currently represented by 0.0 so if I plot the DataFrame I get

I want to try forecast values for 2016, possibly using an ARIMA model, so the first step I took was to perform a decomposition of this time series

Obviously if I leave 2010 in the DataFrame any attempted prediction will be skewed by the apparent, albeit erroneous, drop in sales.

What is the recommended approach in this situation? I think I should just drop 2010 altogether, but then I don't know if my time series is valid going from 2009 to 2011. I don't want to fill the missing values, because I don't believe I can do so accurately.

If I just delete 2010, however, the plot 'fills in' 2010 which doesn't help me

sales = sales.drop(sales['2010'].index)

来源:https://stackoverflow.com/questions/38119601/pandas-time-series-dataframe-missing-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!