Time series Data Missing Time values and Data values

你离开我真会死。 提交于 2019-12-25 06:30:41

问题


I have the following time-series dataset sample here:

ymd      rf
19820103  3
19820104  9
19820118  4
19820119  2
19820122  0
19820218  5

Now the dataset is supposed to be organized in a daily time-series manner. More specifically, ymd is supposed to range continuously from 19820101 through 19820230. However, as you can see from the sample above, the dataset is not continuous and does not contain days such as "19820101" and "19820102", etc. For these dates where the dataset is unavailable, I'd like to be able to include the missing days and enter a "0" value for the rf.

What would be the best way to make a script to automate this problem? I'll have to do this from 1979 through 2016 daily time-series datasets.


回答1:


Let's assume your data is in a data frame named "mydata". Then you could do the following:

#Create full ymd with all the needed dates
ymd.full <- data.frame(ymd=seq(min(mydata$ymd), max(mydata$ymd)))

#Merge both datasets
mydata <- merge(ymd.full, mydata, all.x=T)

#Replace NAs with 0
mydata[is.na(mydata)] <- 0



回答2:


This solution is similar to @Gaurav Bansal's, but uses dplyr:

ymd.full <- data.frame(ymd=seq(min(mydata$ymd), max(mydata$ymd))
newdata  <- dplyr::left_join(ymd.full, mydata)
newdata[is.na(newdata)] <- 0

I'm wondering, though, how the ymd translates to a date, and since I suppose you want to do time series analysis, whether leap days are accounted for in your set.



来源:https://stackoverflow.com/questions/38438140/time-series-data-missing-time-values-and-data-values

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!