I am writing a very basic program to predict missing values in a dataset using scikit-learn\'s Imputer class.
I have made a NumPy array, created an Imp
After scikit-learn version 0.20 the usage of impute module was changed. Now, we can use imputer like;
from sklearn.impute import SimpleImputer
impute = SimpleImputer(missing_values=np.nan, strategy='mean')
impute.fit(X)
X=impute.transform(X)
Pay attention:
Instead of 'NaN', np.nan is used
Don't need to use axis parameter
We can use imp or imputer instead of my impute
variable
Note: Due to the change in the sklearn library 'NaN' has to be replaced with np.nan as shown below.
from sklearn.preprocessing import Imputer
imputer = Imputer(missing_values= np.nan,strategy='mean',axis=0)
imputer = imputer.fit(X[:,1:3])
X[:,1:3]= imputer.transform(X[:,1:3])
Per the documentation, sklearn.preprocessing.Imputer.fit_transform
returns a new array, it doesn't alter the argument array. The minimal fix is therefore:
X = imp.fit_transform(X)