问题
I have a data frame as below with 5000+ rows. I am trying to insert a row where the month is missing e.g. month 6 below - and then utilise linear interpolation to calculate the 'TWS' value. Ideally the Decimal Date would be filled appropriately too but I can sort this afterwards if not! The data frame is months 1:12 for 10 years (2003-2012) but this repeats for multiple grid squares.
I have found lots other similar questions but not relating to a repeating 1:12 monthly sequence.
> head(ts.data,20)
GridNo GridIndex Lon Lat DecimDate Year Month TWS
1 GR72 72 35.5 -4.5 2003.000 2003 01 14.2566781
2 GR72 72 35.5 -4.5 2003.083 2003 02 5.0413706
3 GR72 72 35.5 -4.5 2003.167 2003 03 3.8192721
4 GR72 72 35.5 -4.5 2003.250 2003 04 5.8706026
5 GR72 72 35.5 -4.5 2003.333 2003 05 7.8461188
6 GR72 72 35.5 -4.5 2003.500 2003 07 2.3821844
7 GR72 72 35.5 -4.5 2003.583 2003 08 0.1995629
8 GR72 72 35.5 -4.5 2003.667 2003 09 -1.8353604
9 GR72 72 35.5 -4.5 2003.750 2003 10 -2.0410653
10 GR72 72 35.5 -4.5 2003.833 2003 11 -1.4029813
11 GR72 72 35.5 -4.5 2003.917 2003 12 -0.2206872
12 GR72 72 35.5 -4.5 2004.000 2004 01 -0.5090872
13 GR72 72 35.5 -4.5 2004.083 2004 02 -0.4887118
14 GR72 72 35.5 -4.5 2004.167 2004 03 -0.7725966
15 GR72 72 35.5 -4.5 2004.250 2004 04 4.1831581
16 GR72 72 35.5 -4.5 2004.333 2004 05 2.5651040
17 GR72 72 35.5 -4.5 2004.417 2004 06 -2.2511409
18 GR72 72 35.5 -4.5 2004.500 2004 07 -1.6484375
19 GR72 72 35.5 -4.5 2004.583 2004 08 -4.6508982
20 GR72 72 35.5 -4.5 2004.667 2004 09 -5.0053745
Any help appreciated!
回答1:
Using data.table
and zoo
packages you can easily expand your data set and interpolate as long as you don't have NA
s at both sizes of the year
Expend the data set
library(data.table)
library(zoo)
res <- setDT(df)[, .SD[match(1:12, Month)], by = Year]
Interpolate on whatever column you want
cols <- c("Month", "DecimDate", "TWS")
res[, (cols) := lapply(.SD, na.approx, na.rm = FALSE), .SDcols = cols]
res
# Year GridNo GridIndex Lon Lat DecimDate Month TWS
# 1: 2003 GR72 72 35.5 -4.5 2003.000 1 14.2566781
# 2: 2003 GR72 72 35.5 -4.5 2003.083 2 5.0413706
# 3: 2003 GR72 72 35.5 -4.5 2003.167 3 3.8192721
# 4: 2003 GR72 72 35.5 -4.5 2003.250 4 5.8706026
# 5: 2003 GR72 72 35.5 -4.5 2003.333 5 7.8461188
# 6: 2003 NA NA NA NA 2003.417 6 5.1141516
# 7: 2003 GR72 72 35.5 -4.5 2003.500 7 2.3821844
# 8: 2003 GR72 72 35.5 -4.5 2003.583 8 0.1995629
# 9: 2003 GR72 72 35.5 -4.5 2003.667 9 -1.8353604
# 10: 2003 GR72 72 35.5 -4.5 2003.750 10 -2.0410653
# 11: 2003 GR72 72 35.5 -4.5 2003.833 11 -1.4029813
# 12: 2003 GR72 72 35.5 -4.5 2003.917 12 -0.2206872
# 13: 2004 GR72 72 35.5 -4.5 2004.000 1 -0.5090872
# 14: 2004 GR72 72 35.5 -4.5 2004.083 2 -0.4887118
# 15: 2004 GR72 72 35.5 -4.5 2004.167 3 -0.7725966
# 16: 2004 GR72 72 35.5 -4.5 2004.250 4 4.1831581
# 17: 2004 GR72 72 35.5 -4.5 2004.333 5 2.5651040
# 18: 2004 GR72 72 35.5 -4.5 2004.417 6 -2.2511409
# 19: 2004 GR72 72 35.5 -4.5 2004.500 7 -1.6484375
# 20: 2004 GR72 72 35.5 -4.5 2004.583 8 -4.6508982
# 21: 2004 GR72 72 35.5 -4.5 2004.667 9 -5.0053745
# 22: 2004 NA NA NA NA NA NA NA
# 23: 2004 NA NA NA NA NA NA NA
# 24: 2004 NA NA NA NA NA NA NA
回答2:
I would simply first transform your dates into actual Dates (here taking the first of every month:
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
Do the same for the target, missing months (here just one but can work with many):
target <- as.Date("2003-06-01")
And do the approximation:
approx(dates, ts.data$TWS, target)
$x
[1] "2003-06-01"
$y
[1] 5.069365
So in the context of your dataframe (here simplified):
ts.data <- data.frame(Year=c(rep(2003,11),rep(2004,9)),Month=c((1:12)[-6],1:9),TWS=c(14.2566781,5.0413706,3.8192721,5.8706026,7.8461188, 2.3821844, 0.1995629,-1.8353604,-2.0410653,-1.4029813,-0.2206872,-0.5090872,-0.4887118,-0.7725966, 4.1831581, 2.5651040,-2.2511409,-1.6484375,-4.6508982, -5.0053745))
dates <- as.Date(paste(ts.data$Year, ts.data$Month, 1, sep="-"))
target <- as.Date("2003-06-01")
ts.data <- rbind(ts.data,
data.frame(Year=2003,
Month=6,
TWS=approx(dates, ts.data$TWS, target)$y)
ts.data <- ts.data[order(ts.data$Year, ts.data$Month),]
来源:https://stackoverflow.com/questions/31383601/r-insert-row-for-missing-monthly-data-and-interpolate