Replace missing values (NA) with most recent non-NA by group

前端 未结 7 894
南旧
南旧 2020-11-22 05:42

I would like to solve the following problem with dplyr. Preferable with one of the window-functions. I have a data frame with houses and buying prices. The following is an e

7条回答
  •  孤街浪徒
    2020-11-22 06:02

    Pure dplyr solution (no zoo).

    df %>% 
     group_by(houseID) %>%
     mutate(price_change = cumsum(0 + !is.na(price))) %>%
     group_by(price_change, add = TRUE) %>%
     mutate(price_filled = nth(price, 1)) %>%
     ungroup() %>%
     select(-price_change) -> df2
    

    Intresting part of example solution is at the end of df2.

    > tail(df2, 20)
    Source: local data frame [20 x 4]
    
        houseID year     price price_filled
     1       14 1995        NA           NA
     2       14 1996        NA           NA
     3       14 1997        NA           NA
     4       14 1998        NA           NA
     5       14 1999 0.8374778    0.8374778
     6       14 2000        NA    0.8374778
     7       14 2001        NA    0.8374778
     8       14 2002        NA    0.8374778
     9       14 2003 2.1918880    2.1918880
    10       14 2004        NA    2.1918880
    11       15 1995        NA           NA
    12       15 1996 0.3982450    0.3982450
    13       15 1997        NA    0.3982450
    14       15 1998 1.7727000    1.7727000
    15       15 1999        NA    1.7727000
    16       15 2000        NA    1.7727000
    17       15 2001        NA    1.7727000
    18       15 2002 7.8636329    7.8636329
    19       15 2003        NA    7.8636329
    20       15 2004        NA    7.8636329
    

提交回复
热议问题