Predicting missing values with scikit-learn's Imputer module

后端 未结 3 1617
情话喂你
情话喂你 2021-02-05 09:48

I am writing a very basic program to predict missing values in a dataset using scikit-learn\'s Imputer class.

I have made a NumPy array, created an Imp

相关标签:
3条回答
  • 2021-02-05 10:06

    After scikit-learn version 0.20 the usage of impute module was changed. Now, we can use imputer like;

    from sklearn.impute import SimpleImputer
    impute = SimpleImputer(missing_values=np.nan, strategy='mean')
    impute.fit(X)
    X=impute.transform(X)
    

    Pay attention:

    Instead of 'NaN', np.nan is used

    Don't need to use axis parameter

    We can use imp or imputer instead of my impute variable

    0 讨论(0)
  • 2021-02-05 10:06

    Note: Due to the change in the sklearn library 'NaN' has to be replaced with np.nan as shown below.

     from sklearn.preprocessing import Imputer
     imputer = Imputer(missing_values= np.nan,strategy='mean',axis=0)  
     imputer = imputer.fit(X[:,1:3])
     X[:,1:3]= imputer.transform(X[:,1:3])
    
    0 讨论(0)
  • 2021-02-05 10:08

    Per the documentation, sklearn.preprocessing.Imputer.fit_transform returns a new array, it doesn't alter the argument array. The minimal fix is therefore:

    X = imp.fit_transform(X)
    
    0 讨论(0)
提交回复
热议问题