group_by() into fill() not working as expected

后端 未结 6 816
盖世英雄少女心
盖世英雄少女心 2020-12-31 11:02

I\'m trying to do a Last Observation Carried Forward operation on some poorly formatted data using dplyr and tidyr. It isn\'t working as I\'d expe

相关标签:
6条回答
  • 2020-12-31 11:15

    Another option is to use do from dplyr:

    df3 <- df %>% group_by(id) %>% do(fill(.,email))
    
    0 讨论(0)
  • 2020-12-31 11:22

    Luckily you can still use zoo::na.locf for this:

    df %>% 
        group_by(id) %>% 
        mutate(email = zoo::na.locf(email, na.rm = FALSE))  
    # Source: local data frame [6 x 2]
    # Groups: id [3]
    # 
    #      id         email
    #   (dbl)        (fctr)
    # 1     1 bob@email.com
    # 2     1 bob@email.com
    # 3     2 joe@email.com
    # 4     2 joe@email.com
    # 5     3            NA
    # 6     3            NA
    
    0 讨论(0)
  • 2020-12-31 11:25

    Two questions, does it has be duplicated and do you have to use dplyr and tidyr?

    Maybe this could be a solution?

    (
    bar <- data.frame(id=c(1,1,2,2,3,3),
                     email=c('bob@email.com', NA, 'joe@email.com', NA, NA, NA))
    )                 
    #> id         email
    #>  1 bob@email.com
    #>  1          <NA>
    #>  2 joe@email.com
    #>  2          <NA>
    #>  3          <NA>
    #>  3          <NA>
    
    (                 
    foo <- bar[!duplicated(bar$id),]
    )
    #> id         email
    #>  1 bob@email.com
    #>  2 joe@email.com
    #>  3          <NA>
    
    0 讨论(0)
  • 2020-12-31 11:25

    This is kind of ugly, but it is another option that uses dplyr and works with your sample data

    df %>%
       group_by(id) %>%
       mutate(email = email[ !is.na(email) ][1])
    
    0 讨论(0)
  • 2020-12-31 11:26

    Looks like this has been fixed in the development version of tidyr. You now get the expected result per id using fill from tidyr_0.3.1.9000.

    df %>% group_by(id) %>% fill(email)
    
    Source: local data frame [6 x 2]
    Groups: id [3]
    
         id         email
      (dbl)        (fctr)
    1     1 bob@email.com
    2     1 bob@email.com
    3     2 joe@email.com
    4     2 joe@email.com
    5     3            NA
    6     3            NA
    
    0 讨论(0)
  • 2020-12-31 11:28

    I have come across this issue quite a few times, I do worry about using this..

    df2 <- df %>% group_by(id) %>% fill(email)

    on large data sets as I have had mixed results and found the following work around. The split function used with map_df ensures you apply whatever you are doing to the a specific df for each id and map_df then re binds all the individual df like magic. It has also proved handy in lots of other circumstances. Somewhat obsolete now this issue has been fixed but still a useful alternative that avoids group_by().

    df %>% split(.$id) %>% map_df(function(x){ x %>% fill(email)})

    0 讨论(0)
提交回复
热议问题