Replace NA with previous or next value, by group, using dplyr

巧了我就是萌 提交于 2019-11-26 02:38:11

问题


I have a data frame which is arranged by descending order of date.

ps1 = data.frame(userID = c(21,21,21,22,22,22,23,23,23), 
             color = c(NA,\'blue\',\'red\',\'blue\',NA,NA,\'red\',NA,\'gold\'), 
             age = c(\'3yrs\',\'2yrs\',NA,NA,\'3yrs\',NA,NA,\'4yrs\',NA), 
             gender = c(\'F\',NA,\'M\',NA,NA,\'F\',\'F\',NA,\'F\') 
)

I wish to impute(replace) NA values with previous values and grouped by userID In case the first row of a userID has NA then replace with the next set of values for that userid group.

I am trying to use dplyr and zoo packages something like this...but its not working

cleanedFUG <- filteredUserGroup %>%
 group_by(UserID) %>%
 mutate(Age1 = na.locf(Age), 
     Color1 = na.locf(Color), 
     Gender1 = na.locf(Gender) ) 

I need result df like this:

                      userID color  age gender
                1     21  blue 3yrs      F
                2     21  blue 2yrs      F
                3     21   red 2yrs      M
                4     22  blue 3yrs      F
                5     22  blue 3yrs      F
                6     22  blue 3yrs      F
                7     23   red 4yrs      F
                8     23   red 4yrs      F
                9     23  gold 4yrs      F

回答1:


require(tidyverse) #fill is part of tidyr

ps1 %>% 
  group_by(userID) %>% 
  fill(color, age, gender) %>% #default direction down
  fill(color, age, gender, .direction = "up")

Which gives you:

Source: local data frame [9 x 4]
Groups: userID [3]

  userID  color    age gender
   <dbl> <fctr> <fctr> <fctr>
1     21   blue   3yrs      F
2     21   blue   2yrs      F
3     21    red   2yrs      M
4     22   blue   3yrs      F
5     22   blue   3yrs      F
6     22   blue   3yrs      F
7     23    red   4yrs      F
8     23    red   4yrs      F
9     23   gold   4yrs      F



回答2:


Using zoo::na.locf directly on the whole data.frame would fill the NA regardless of the userID groups. Package dplyr's grouping has unfortunately no effect on na.locf function, that's why I went with a split:

library(dplyr); library(zoo)
ps1 %>% split(ps1$userID) %>% 
  lapply(function(x) {na.locf(na.locf(x), fromLast=T)}) %>% 
  do.call(rbind, .)
####      userID color  age gender
#### 21.1     21  blue 3yrs      F
#### 21.2     21  blue 2yrs      F
#### 21.3     21   red 2yrs      M
#### 22.4     22  blue 3yrs      F
#### 22.5     22  blue 3yrs      F
#### 22.6     22  blue 3yrs      F
#### 23.7     23   red 4yrs      F
#### 23.8     23   red 4yrs      F
#### 23.9     23  gold 4yrs      F

What it does is that it first splits the data into 3 data.frames, then I apply a first pass of imputation (downwards), then upwards with the anonymous function in lapply, and eventually use rbind to bring the data.frames back together. You have the expected output.




回答3:


Using @agenis method with na.locf() combined with purrr, you could do:

library(purrr)
library(zoo)

ps1 %>% 
  slice_rows("userID") %>% 
  by_slice(function(x) { 
    na.locf(na.locf(x), fromLast=T) }, 
    .collate = "rows") 



回答4:


I wrote this function and it is definitely faster than fill and probably faster than na.locf:

fill_NA <- function(x) {
  which.na <- c(which(!is.na(x)), length(x) + 1)
  values <- na.omit(x)

  if (which.na[1] != 1) {
    which.na <- c(1, which.na)
    values <- c(values[1], values)
  }

  diffs <- diff(which.na)
  return(rep(values, times = diffs))
}


来源:https://stackoverflow.com/questions/40040834/replace-na-with-previous-or-next-value-by-group-using-dplyr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!