问题
I have a time series data frame in R that has a column, V1, which consists of integers with a few NAs interspersed throughout. I want to iterate over this column and subtract V1 from itself one time step previously. However, I want to ignore the NA values in V1 and use the last non-NA value in the subtraction. If the current value of V1 is NA, then the difference should return NA. See below for an example
V1 <- c(1, 3, 4, NA, NA, 6, 9, NA, 10)
time <- 1:length(V1)
dat <- data.frame(time = time,
V1 = V1)
lag_diff <- c(NA, 2, 1, NA, NA, 2, 3, NA, 1) # The result I want
diff(dat$V1) # Not the result I want
I'd prefer not to do this with loops because I have hundreds of data frames, each with >10,000 rows.
My first thought to solve this was to filter out the NA rows, perform the iterative difference calculation and then reinsert the rows that were filtered out but I can't think of a way to do that. It doesn't seem very "tidy" to do it that way either and I'm not sure it would be faster than looping. Any help is appreciated, bonus points if the solution uses tidyverse functions.
回答1:
dat[!is.na(dat$V1), 'lag_diff'] <- c(NA, diff(dat[!is.na(dat$V1), 'V1']))
# time V1 lag_diff
# 1 1 1 NA
# 2 2 3 2
# 3 3 4 1
# 4 4 NA NA
# 5 5 NA NA
# 6 6 6 2
# 7 7 9 3
# 8 8 NA NA
# 9 9 10 1
Or with data.table
(same result)
library(data.table)
setDT(dat)
dat[!is.na(V1), lag_diff := V1 - shift(V1)]
# time V1 lag_diff
# 1: 1 1 NA
# 2: 2 3 2
# 3: 3 4 1
# 4: 4 NA NA
# 5: 5 NA NA
# 6: 6 6 2
# 7: 7 9 3
# 8: 8 NA NA
# 9: 9 10 1
回答2:
A tidyverse version, just in case. It does need a filter
though
dat %>%
filter(!is.na(V1)) %>%
mutate(diff=V1- lag(V1)) %>%
right_join(dat,by=c("time","V1"))
来源:https://stackoverflow.com/questions/52062687/iterate-over-a-column-ignoring-but-retaining-na-values-in-r