Load and predict new data sklearn

前端 未结 1 1556
无人共我
无人共我 2021-02-06 10:34

I trained a Logistic model, cross-validated and saved it to file using joblib module. Now I want to load this model and predict new data with it. Is this the correct way to do

相关标签:
1条回答
  • 2021-02-06 10:59

    No, it's incorrect. All the data preparation steps should be fit using train data. Otherwise, you risk applying the wrong transformations, because means and variances that StandardScaler estimates do probably differ between train and test data.

    The easiest way to train, save, load and apply all the steps simultaneously is to use Pipelines:

    At training:

    # prepare the pipeline
    from sklearn.pipeline import make_pipeline
    from sklearn.preprocessing import StandardScaler
    from sklearn.linear_model import LogisticRegression
    from sklearn.externals import joblib
    
    pipe = make_pipeline(StandardScaler(), LogisticRegression)
    pipe.fit(X_train, y_train)
    joblib.dump(pipe, 'model.pkl')
    

    At prediction:

    #Loading the saved model with joblib
    pipe = joblib.load('model.pkl')
    
    # New data to predict
    pr = pd.read_csv('set_to_predict.csv')
    pred_cols = list(pr.columns.values)[:-1]
    
    # apply the whole pipeline to data
    pred = pd.Series(pipe.predict(pr[pred_cols]))
    print pred
    
    0 讨论(0)
提交回复
热议问题