Save MinMaxScaler model in sklearn

前端 未结 5 1498
青春惊慌失措
青春惊慌失措 2020-12-23 13:58

I\'m using the MinMaxScaler model in sklearn to normalize the features of a model.

training_set = np.random.rand(4,4)*10
training_set

       [[         


        
相关标签:
5条回答
  • 2020-12-23 14:36

    The best way to do this is to create an ML pipeline like the following:

    from sklearn.pipeline import make_pipeline
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.externals import joblib
    
    
    pipeline = make_pipeline(MinMaxScaler(),YOUR_ML_MODEL() )
    
    model = pipeline.fit(X_train, y_train)
    

    Now you can save it to a file:

    joblib.dump(model, 'filename.mod') 
    

    Later you can load it like this:

    model = joblib.load('filename.mod')
    
    0 讨论(0)
  • 2020-12-23 14:37

    You can use pickle, to save the scaler:

    import pickle
    scalerfile = 'scaler.sav'
    pickle.dump(scaler, open(scalerfile, 'wb'))
    

    Load it back:

    import pickle
    scalerfile = 'scaler.sav'
    scaler = pickle.load(open(scalerfile, 'rb'))
    test_scaled_set = scaler.transform(test_set)
    
    0 讨论(0)
  • 2020-12-23 14:38

    Even better than pickle (which creates much larger files than this method), you can use sklearn's built-in tool:

    from sklearn.externals import joblib
    scaler_filename = "scaler.save"
    joblib.dump(scaler, scaler_filename) 
    
    # And now to load...
    
    scaler = joblib.load(scaler_filename) 
    

    Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

    0 讨论(0)
  • 2020-12-23 14:45

    So I'm actually not an expert with this but from a bit of research and a few helpful links, I think pickle and sklearn.externals.joblib are going to be your friends here.

    The package pickle lets you save models or "dump" models to a file.

    I think this link is also helpful. It talks about creating a persistence model. Something that you're going to want to try is:

    # could use: import pickle... however let's do something else
    from sklearn.externals import joblib 
    
    # this is more efficient than pickle for things like large numpy arrays
    # ... which sklearn models often have.   
    
    # then just 'dump' your file
    joblib.dump(clf, 'my_dope_model.pkl') 
    

    Here is where you can learn more about the sklearn externals.

    Let me know if that doesn't help or I'm not understanding something about your model.

    Note: sklearn.externals.joblib is deprecated. Install and use the pure joblib instead

    0 讨论(0)
  • 2020-12-23 14:47

    Just a note that sklearn.externals.joblib has been deprecated and is superseded by plain old joblib, which can be installed with pip install joblib:

    import joblib
    joblib.dump(my_scaler, 'scaler.gz')
    my_scaler = joblib.load('scaler.gz')
    

    Note that file extensions can be anything, but if it is one of ['.z', '.gz', '.bz2', '.xz', '.lzma'] then the corresponding compression protocol will be used. Docs for joblib.dump() and joblib.load() methods.

    0 讨论(0)
提交回复
热议问题