问题
I have over 1500 columns in my dataset and 100+ of them contains at least one NA. I know I can replace NAs in a single column by
d$var[is.na(d$var)] <- mean(d$var, na.rm=TRUE)
but how do I do this too ALL the NAs in my dataset?
Thank you!
回答1:
We can use na.aggregate
from zoo
. Loop through the columns of dataset (assuming all the columns are numeric
), apply the na.aggregate
to replace the NA with mean
values (by default) and assign it back to the dataset.
library(zoo)
df[] <- lapply(df, na.aggregate)
By default, the FUN
argument of na.aggregate
is mean
:
Default S3 method:
na.aggregate(object, by = 1, ..., FUN = mean, na.rm = FALSE, maxgap = Inf)
To do this nondestructively:
df2 <- df
df2[] <- lapply(df2, na.aggregate)
or in one line:
df2 <- replace(df, TRUE, lapply(df, na.aggregate))
If there are non-numeric columns, do this only for the numeric columns by creating a logical index first
ok <- sapply(df, is.numeric)
df[ok] <- lapply(df[ok], na.aggregate)
来源:https://stackoverflow.com/questions/41195485/how-do-i-replace-all-na-with-mean-in-r