Question has been edited from the original.
After reading this interesting discussion I was wondering how to replace NAs in a column using dplyr in, for
The main issue you're having is that mean
returns a double while the G_batting
column is an integer. So wrapping the mean in as.integer
would work, or you'd need to convert the entire column to numeric
I guess.
That said, here are a couple of data.table
alternatives - I didn't check which one is faster.
library(data.table)
# using ifelse
dt = data.table(a = 1:2, b = c(1,2,NA,NA,3,4,5,6,7,8))
dt[, b := ifelse(is.na(b), mean(b, na.rm = T), b), by = a]
# using a temporary column
dt = data.table(a = 1:2, b = c(1,2,NA,NA,3,4,5,6,7,8))
dt[, b.mean := mean(b, na.rm = T), by = a][is.na(b), b := b.mean][, b.mean := NULL]
And this is what I'd want to do ideally (there is an FR about this):
# again, atm this is pure fantasy and will not work
dt[, b[is.na(b)] := mean(b, na.rm = T), by = a]
The dplyr
version of the ifelse
is (as in OP):
dt %>% group_by(a) %>% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b))
I'm not sure how to implement the second data.table
idea in a single line in dplyr
. I'm also not sure how you can stop dplyr
from scrambling/ordering the data (aside from creating an index column).