问题
I am trying to fill missing data based on whether the previous and last NA value are the same. For example, this is the dummy dataset:
df <- data.frame(ID = c(rep(1, 6), rep(2, 6), rep(3, 6), rep(4, 6), rep(5, 6), rep(6, 6),
rep(7, 6), rep(8, 6), rep(9, 6), rep(10, 6)),
with_missing = c("a", "a", NA, NA, "a", "a",
"a", "a", NA, "b", "b", "b",
"a", NA, NA, NA, "c", "c",
"b", NA, "a", "a", "a", "a",
"a", NA, NA, NA, NA, "a",
"a", "a", NA, "b", "a", "a",
"a", "a", NA, NA, "a", "a",
"a", "a", NA, "b", "b", "b",
"a", NA, NA, NA, "c", "c",
"b", NA, "a", "a", "a", "a"),
desired_result = c("a", "a", "a", "a", "a", "a",
"a", "a", NA, "b", "b", "b",
"a", NA, NA, NA, "c", "c",
"b", NA, "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a",
"a", "b", "b", "b", "a", "a",
"a", "a", "a", "a", "a", "a",
"a", "a", NA, "b", "b", "b",
"a", NA, NA, NA, "c", "c",
"b", NA, "a", "a", "a", "a"))
So if there is a gap of four rows, for example, but the value before and after the gap are the same, then I want the gap to be filled with those same values; whereas if the values before and after the NA are different, I don't want to fill it. In addition, I need to group the data by the ID variable.
I've tried na.locf but I can't work out how to add in the condition of "if they're the same before and after NA".
Thanks.
回答1:
You can fill forwards and backwards, then set the rows where they don't match to NA
.
library(zoo)
library(dplyr)
df %>%
mutate_if(is.factor, as.character) %>%
group_by(ID) %>%
mutate(result = na.locf(with_missing, fromLast = T),
result = ifelse(result == na.locf(with_missing), result, NA))
# ID with_missing desired_result result
# 1 1 a a a
# 2 1 a a a
# 3 1 <NA> a a
# 4 1 <NA> a a
# 5 1 a a a
# 6 1 a a a
# 7 2 a a a
# 8 2 a a a
# 9 2 <NA> <NA> <NA>
# 10 2 b b b
# 11 2 b b b
# 12 2 b b b
# 13 3 a a a
# 14 3 <NA> <NA> <NA>
# 15 3 <NA> <NA> <NA>
# 16 3 <NA> <NA> <NA>
# 17 3 c c c
# 18 3 c c c
# 19 4 b b b
# 20 4 <NA> <NA> <NA>
# 21 4 a a a
# 22 4 a a a
# 23 4 a a a
# 24 4 a a a
# 25 5 a a a
# 26 5 <NA> a a
# 27 5 <NA> a a
# 28 5 <NA> a a
# 29 5 <NA> a a
# 30 5 a a a
# 31 6 a a a
# 32 6 a b a
# 33 6 <NA> b <NA>
# 34 6 b b b
# 35 6 a a a
# 36 6 a a a
# 37 7 a a a
# 38 7 a a a
# 39 7 <NA> a a
# 40 7 <NA> a a
# 41 7 a a a
# 42 7 a a a
# 43 8 a a a
# 44 8 a a a
# 45 8 <NA> <NA> <NA>
# 46 8 b b b
# 47 8 b b b
# 48 8 b b b
# 49 9 a a a
# 50 9 <NA> <NA> <NA>
# 51 9 <NA> <NA> <NA>
# 52 9 <NA> <NA> <NA>
# 53 9 c c c
# 54 9 c c c
# 55 10 b b b
# 56 10 <NA> <NA> <NA>
# 57 10 a a a
# 58 10 a a a
# 59 10 a a a
# 60 10 a a a
来源:https://stackoverflow.com/questions/56134078/replace-na-values-if-last-and-next-non-na-value-are-the-same