Time series Data Missing Time values and Data values

问题

I have the following time-series dataset sample here:

ymd      rf
19820103  3
19820104  9
19820118  4
19820119  2
19820122  0
19820218  5

Now the dataset is supposed to be organized in a daily time-series manner. More specifically, ymd is supposed to range continuously from 19820101 through 19820230. However, as you can see from the sample above, the dataset is not continuous and does not contain days such as "19820101" and "19820102", etc. For these dates where the dataset is unavailable, I'd like to be able to include the missing days and enter a "0" value for the rf.

What would be the best way to make a script to automate this problem? I'll have to do this from 1979 through 2016 daily time-series datasets.

回答1:

Let's assume your data is in a data frame named "mydata". Then you could do the following:

#Create full ymd with all the needed dates
ymd.full <- data.frame(ymd=seq(min(mydata$ymd), max(mydata$ymd)))

#Merge both datasets
mydata <- merge(ymd.full, mydata, all.x=T)

#Replace NAs with 0
mydata[is.na(mydata)] <- 0

回答2:

This solution is similar to @Gaurav Bansal's, but uses dplyr:

ymd.full <- data.frame(ymd=seq(min(mydata$ymd), max(mydata$ymd))
newdata  <- dplyr::left_join(ymd.full, mydata)
newdata[is.na(newdata)] <- 0

I'm wondering, though, how the ymd translates to a date, and since I suppose you want to do time series analysis, whether leap days are accounted for in your set.

来源：https://stackoverflow.com/questions/38438140/time-series-data-missing-time-values-and-data-values

标签

time-series

missing-data