发表新帖

发表新帖

replace NA in a dplyr chain

后端未结

关注

 1  864

花落未央 2021-02-01 02:05

Question has been edited from the original.

After reading this interesting discussion I was wondering how to replace NAs in a column using dplyr in, for

1条回答

-上瘾入骨i (楼主)

2021-02-01 02:45
The main issue you're having is that mean returns a double while the G_batting column is an integer. So wrapping the mean in as.integer would work, or you'd need to convert the entire column to numeric I guess.

That said, here are a couple of data.table alternatives - I didn't check which one is faster.
```
library(data.table)

# using ifelse
dt = data.table(a = 1:2, b = c(1,2,NA,NA,3,4,5,6,7,8))
dt[, b := ifelse(is.na(b), mean(b, na.rm = T), b), by = a]

# using a temporary column
dt = data.table(a = 1:2, b = c(1,2,NA,NA,3,4,5,6,7,8))
dt[, b.mean := mean(b, na.rm = T), by = a][is.na(b), b := b.mean][, b.mean := NULL]
```
And this is what I'd want to do ideally (there is an FR about this):
```
# again, atm this is pure fantasy and will not work
dt[, b[is.na(b)] := mean(b, na.rm = T), by = a]
```
The dplyr version of the ifelse is (as in OP):
```
dt %>% group_by(a) %>% mutate(b = ifelse(is.na(b), mean(b, na.rm = T), b))
```
I'm not sure how to implement the second data.table idea in a single line in dplyr. I'm also not sure how you can stop dplyr from scrambling/ordering the data (aside from creating an index column).
0 讨论(0)
发布评论:

提交评论
- 加载中...

热议问题