问题
I'm trying to learn how to implement MICE in imputing missing values for my datasets. I've heard about fancyimpute's MICE, but I also read that sklearn's IterativeImputer class can accomplish similar results. From sklearn's docs:
Our implementation of IterativeImputer was inspired by the R MICE package (Multivariate Imputation by Chained Equations) [1], but differs from it by returning a single imputation instead of multiple imputations. However, IterativeImputer can also be used for multiple imputations by applying it repeatedly to the same dataset with different random seeds when sample_posterior=True
I've seen "seeds" being used in different pipelines, but I never understood them well enough to implement them in my own code. I was wondering if anyone could explain and provide an example on how to implement seeds for a MICE imputation using sklearn's IterativeImputer? Thanks!
回答1:
IterativeImputer
behavior can change depending on a random state. The random state which can be set is also called a "seed".
As stated by the documentation, we can get multiple imputations when setting sample_posterior
to True
and changing the random seeds, i.e. the parameter random_state
.
Here is an example of how to use it:
import numpy as np
from sklearn.experimental import enable_iterative_imputer
from sklearn.impute import IterativeImputer
X_train = [[1, 2],
[3, 6],
[4, 8],
[np.nan, 3],
[7, np.nan]]
X_test = [[np.nan, 2],
[np.nan, np.nan],
[np.nan, 6]]
for i in range(3):
imp = IterativeImputer(max_iter=10, random_state=i, sample_posterior=True)
imp.fit(X_train)
print(f"imputation {i}:")
print(np.round(imp.transform(X_test)))
It outputs:
imputation 0:
[[ 1. 2.]
[ 5. 10.]
[ 3. 6.]]
imputation 1:
[[1. 2.]
[0. 1.]
[3. 6.]]
imputation 2:
[[1. 2.]
[1. 2.]
[3. 6.]]
We can observe the three different imputations.
来源:https://stackoverflow.com/questions/58613108/imputing-missing-values-using-sklearn-iterativeimputer-class-for-mice