I\'m trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr
and tidyr
. It isn\'t working as I\'d expe
Another option is to use do
from dplyr
:
df3 <- df %>% group_by(id) %>% do(fill(.,email))
Luckily you can still use zoo::na.locf
for this:
df %>%
group_by(id) %>%
mutate(email = zoo::na.locf(email, na.rm = FALSE))
# Source: local data frame [6 x 2]
# Groups: id [3]
#
# id email
# (dbl) (fctr)
# 1 1 bob@email.com
# 2 1 bob@email.com
# 3 2 joe@email.com
# 4 2 joe@email.com
# 5 3 NA
# 6 3 NA
Two questions, does it has be duplicated and do you have to use dplyr
and tidyr
?
Maybe this could be a solution?
(
bar <- data.frame(id=c(1,1,2,2,3,3),
email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
)
#> id email
#> 1 bob@email.com
#> 1 <NA>
#> 2 joe@email.com
#> 2 <NA>
#> 3 <NA>
#> 3 <NA>
(
foo <- bar[!duplicated(bar$id),]
)
#> id email
#> 1 bob@email.com
#> 2 joe@email.com
#> 3 <NA>
This is kind of ugly, but it is another option that uses dplyr
and works with your sample data
df %>%
group_by(id) %>%
mutate(email = email[ !is.na(email) ][1])
Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill
from tidyr_0.3.1.9000.
df %>% group_by(id) %>% fill(email)
Source: local data frame [6 x 2]
Groups: id [3]
id email
(dbl) (fctr)
1 1 bob@email.com
2 1 bob@email.com
3 2 joe@email.com
4 2 joe@email.com
5 3 NA
6 3 NA
I have come across this issue quite a few times, I do worry about using this..
df2 <- df %>% group_by(id) %>% fill(email)
on large data sets as I have had mixed results and found the following work around. The split function used with map_df ensures you apply whatever you are doing to the a specific df for each id and map_df then re binds all the individual df like magic. It has also proved handy in lots of other circumstances. Somewhat obsolete now this issue has been fixed but still a useful alternative that avoids group_by().
df %>% split(.$id) %>% map_df(function(x){ x %>% fill(email)})