r - insert row for missing monthly data and interpolate

夙愿已清 提交于 2019-12-08 05:45:32

问题


I have a data frame as below with 5000+ rows. I am trying to insert a row where the month is missing e.g. month 6 below - and then utilise linear interpolation to calculate the 'TWS' value. Ideally the Decimal Date would be filled appropriately too but I can sort this afterwards if not! The data frame is months 1:12 for 10 years (2003-2012) but this repeats for multiple grid squares.

I have found lots other similar questions but not relating to a repeating 1:12 monthly sequence.

 > head(ts.data,20)
    GridNo GridIndex  Lon  Lat DecimDate Year Month        TWS
 1    GR72        72 35.5 -4.5  2003.000 2003    01 14.2566781
 2    GR72        72 35.5 -4.5  2003.083 2003    02  5.0413706
 3    GR72        72 35.5 -4.5  2003.167 2003    03  3.8192721
 4    GR72        72 35.5 -4.5  2003.250 2003    04  5.8706026
 5    GR72        72 35.5 -4.5  2003.333 2003    05  7.8461188
 6    GR72        72 35.5 -4.5  2003.500 2003    07  2.3821844
 7    GR72        72 35.5 -4.5  2003.583 2003    08  0.1995629
 8    GR72        72 35.5 -4.5  2003.667 2003    09 -1.8353604
 9    GR72        72 35.5 -4.5  2003.750 2003    10 -2.0410653
 10   GR72        72 35.5 -4.5  2003.833 2003    11 -1.4029813
 11   GR72        72 35.5 -4.5  2003.917 2003    12 -0.2206872
 12   GR72        72 35.5 -4.5  2004.000 2004    01 -0.5090872
 13   GR72        72 35.5 -4.5  2004.083 2004    02 -0.4887118
 14   GR72        72 35.5 -4.5  2004.167 2004    03 -0.7725966
 15   GR72        72 35.5 -4.5  2004.250 2004    04  4.1831581
 16   GR72        72 35.5 -4.5  2004.333 2004    05  2.5651040
 17   GR72        72 35.5 -4.5  2004.417 2004    06 -2.2511409
 18   GR72        72 35.5 -4.5  2004.500 2004    07 -1.6484375
 19   GR72        72 35.5 -4.5  2004.583 2004    08 -4.6508982
 20   GR72        72 35.5 -4.5  2004.667 2004    09 -5.0053745

Any help appreciated!


回答1:


Using data.table and zoo packages you can easily expand your data set and interpolate as long as you don't have NAs at both sizes of the year

Expend the data set

library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]

Interpolate on whatever column you want

cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]

res
#     Year GridNo GridIndex  Lon  Lat DecimDate Month        TWS
#  1: 2003   GR72        72 35.5 -4.5  2003.000     1 14.2566781
#  2: 2003   GR72        72 35.5 -4.5  2003.083     2  5.0413706
#  3: 2003   GR72        72 35.5 -4.5  2003.167     3  3.8192721
#  4: 2003   GR72        72 35.5 -4.5  2003.250     4  5.8706026
#  5: 2003   GR72        72 35.5 -4.5  2003.333     5  7.8461188
#  6: 2003     NA        NA   NA   NA  2003.417     6  5.1141516
#  7: 2003   GR72        72 35.5 -4.5  2003.500     7  2.3821844
#  8: 2003   GR72        72 35.5 -4.5  2003.583     8  0.1995629
#  9: 2003   GR72        72 35.5 -4.5  2003.667     9 -1.8353604
# 10: 2003   GR72        72 35.5 -4.5  2003.750    10 -2.0410653
# 11: 2003   GR72        72 35.5 -4.5  2003.833    11 -1.4029813
# 12: 2003   GR72        72 35.5 -4.5  2003.917    12 -0.2206872
# 13: 2004   GR72        72 35.5 -4.5  2004.000     1 -0.5090872
# 14: 2004   GR72        72 35.5 -4.5  2004.083     2 -0.4887118
# 15: 2004   GR72        72 35.5 -4.5  2004.167     3 -0.7725966
# 16: 2004   GR72        72 35.5 -4.5  2004.250     4  4.1831581
# 17: 2004   GR72        72 35.5 -4.5  2004.333     5  2.5651040
# 18: 2004   GR72        72 35.5 -4.5  2004.417     6 -2.2511409
# 19: 2004   GR72        72 35.5 -4.5  2004.500     7 -1.6484375
# 20: 2004   GR72        72 35.5 -4.5  2004.583     8 -4.6508982
# 21: 2004   GR72        72 35.5 -4.5  2004.667     9 -5.0053745
# 22: 2004     NA        NA   NA   NA        NA    NA         NA
# 23: 2004     NA        NA   NA   NA        NA    NA         NA
# 24: 2004     NA        NA   NA   NA        NA    NA         NA



回答2:


I would simply first transform your dates into actual Dates (here taking the first of every month:

dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))

Do the same for the target, missing months (here just one but can work with many):

target <- as.Date("2003-06-01")

And do the approximation:

approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"

$y
[1] 5.069365

So in the context of your dataframe (here simplified):

ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data, 
                 data.frame(Year=2003, 
                            Month=6, 
                            TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]


来源:https://stackoverflow.com/questions/31383601/r-insert-row-for-missing-monthly-data-and-interpolate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!