group_by() into fill() not working as expected

后端未结

关注

 6  865

I\'m trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr and tidyr. It isn\'t working as I\'d expe

相关标签:

6条回答

醉梦人生

2020-12-31 11:15
Another option is to use do from dplyr:
```
df3 <- df %>% group_by(id) %>% do(fill(.,email))
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

我寻月下人不归

2020-12-31 11:22

Luckily you can still use zoo::na.locf for this:

df %>% 
    group_by(id) %>% 
    mutate(email = zoo::na.locf(email, na.rm = FALSE))  
# Source: local data frame [6 x 2]
# Groups: id [3]
# 
#      id         email
#   (dbl)        (fctr)
# 1     1 bob@email.com
# 2     1 bob@email.com
# 3     2 joe@email.com
# 4     2 joe@email.com
# 5     3            NA
# 6     3            NA

0 讨论(0)

孤城傲影

2020-12-31 11:25

Two questions, does it has be duplicated and do you have to use dplyr and tidyr?

Maybe this could be a solution?

(
bar <- data.frame(id=c(1,1,2,2,3,3),
                 email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
)                 
#> id         email
#>  1 bob@email.com
#>  1          <NA>
#>  2 joe@email.com
#>  2          <NA>
#>  3          <NA>
#>  3          <NA>

(                 
foo <- bar[!duplicated(bar$id),]
)
#> id         email
#>  1 bob@email.com
#>  2 joe@email.com
#>  3          <NA>

0 讨论(0)

孤城傲影

2020-12-31 11:25
This is kind of ugly, but it is another option that uses dplyr and works with your sample data
```
df %>%
   group_by(id) %>%
   mutate(email = email[ !is.na(email) ][1])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

无人共我

2020-12-31 11:26

Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

df %>% group_by(id) %>% fill(email)

Source: local data frame [6 x 2]
Groups: id [3]

     id         email
  (dbl)        (fctr)
1     1 bob@email.com
2     1 bob@email.com
3     2 joe@email.com
4     2 joe@email.com
5     3            NA
6     3            NA

0 讨论(0)

广开言路

2020-12-31 11:28

I have come across this issue quite a few times, I do worry about using this..

df2 <- df %>% group_by(id) %>% fill(email)

on large data sets as I have had mixed results and found the following work around. The split function used with map_df ensures you apply whatever you are doing to the a specific df for each id and map_df then re binds all the individual df like magic. It has also proved handy in lots of other circumstances. Somewhat obsolete now this issue has been fixed but still a useful alternative that avoids group_by().

df %>% split(.$id) %>% map_df(function(x){ x %>% fill(email)})

0 讨论(0)
发布评论:

提交评论
- 加载中...