Replace missing values (NA) with most recent non-NA by group

前端未结

关注

 7  894

南旧 2020-11-22 05:42

I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an e

7条回答

孤街浪徒 (楼主)

2020-11-22 06:02

Pure dplyr solution (no zoo).

df %>% 
 group_by(houseID) %>%
 mutate(price_change = cumsum(0 + !is.na(price))) %>%
 group_by(price_change, add = TRUE) %>%
 mutate(price_filled = nth(price, 1)) %>%
 ungroup() %>%
 select(-price_change) -> df2

Intresting part of example solution is at the end of df2.

> tail(df2, 20)
Source: local data frame [20 x 4]

    houseID year     price price_filled
 1       14 1995        NA           NA
 2       14 1996        NA           NA
 3       14 1997        NA           NA
 4       14 1998        NA           NA
 5       14 1999 0.8374778    0.8374778
 6       14 2000        NA    0.8374778
 7       14 2001        NA    0.8374778
 8       14 2002        NA    0.8374778
 9       14 2003 2.1918880    2.1918880
10       14 2004        NA    2.1918880
11       15 1995        NA           NA
12       15 1996 0.3982450    0.3982450
13       15 1997        NA    0.3982450
14       15 1998 1.7727000    1.7727000
15       15 1999        NA    1.7727000
16       15 2000        NA    1.7727000
17       15 2001        NA    1.7727000
18       15 2002 7.8636329    7.8636329
19       15 2003        NA    7.8636329
20       15 2004        NA    7.8636329

0 讨论(0)

查看其它7个回答