replace NA in a dplyr chain

后端 未结 1 864
花落未央
花落未央 2021-02-01 02:05

Question has been edited from the original.

After reading this interesting discussion I was wondering how to replace NAs in a column using dplyr in, for

1条回答
  •  -上瘾入骨i
    2021-02-01 02:45

    The main issue you're having is that mean returns a double while the G_batting column is an integer. So wrapping the mean in as.integer would work, or you'd need to convert the entire column to numeric I guess.

    That said, here are a couple of data.table alternatives - I didn't check which one is faster.

    library(data.table)
    
    # using ifelse
    dt = data.table(a = 1:2, b = c(1,2,NA,NA,3,4,5,6,7,8))
    dt[, b := ifelse(is.na(b), mean(b, na.rm = T), b), by = a]
    
    # using a temporary column
    dt = data.table(a = 1:2, b = c(1,2,NA,NA,3,4,5,6,7,8))
    dt[, b.mean := mean(b, na.rm = T), by = a][is.na(b), b := b.mean][, b.mean := NULL]
    

    And this is what I'd want to do ideally (there is an FR about this):

    # again, atm this is pure fantasy and will not work
    dt[, b[is.na(b)] := mean(b, na.rm = T), by = a]
    

    The dplyr version of the ifelse is (as in OP):

    dt %>% group_by(a) %>% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b))
    

    I'm not sure how to implement the second data.table idea in a single line in dplyr. I'm also not sure how you can stop dplyr from scrambling/ordering the data (aside from creating an index column).

    0 讨论(0)
提交回复
热议问题