How to normalize the Train and Test data using MinMaxScaler sklearn

前端 未结 2 644
天涯浪人
天涯浪人 2020-12-23 18:09

So, I have this doubt and have been looking for answers. So the question is when I use,

from sklearn import preprocessing
min_max_scaler = preprocessing.MinM         


        
相关标签:
2条回答
  • 2020-12-23 18:47

    Best way is train and save MinMaxScaler model and load the same when it's required.

    Saving model:

    df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
    df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])  
    pickle.dump(min_max_scaler, open("scaler.pkl", 'wb'))
    

    Loading saved model:

    scalerObj = pickle.load(open("scaler.pkl", 'rb'))
    df_test = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})
    df_test[['A','B']] = scalerObj.transform(df_test[['A','B']])
    
    0 讨论(0)
  • 2020-12-23 18:56

    You should fit the MinMaxScaler using the training data and then apply the scaler on the testing data before the prediction.


    In summary:

    • Step 1: fit the scaler on the TRAINING data
    • Step 2: use the scaler to transform the TRAINING data
    • Step 3: use the transformed training data to fit the predictive model
    • Step 4: use the scaler to transform the TEST data
    • Step 5: predict using the trained model (step 3) and the transformed TEST data (step 4).

    Example using your data:

    from sklearn import preprocessing
    min_max_scaler = preprocessing.MinMaxScaler()
    #training data
    df = pd.DataFrame({'A':[1,2,3,7,9,15,16,1,5,6,2,4,8,9],'B':[15,12,10,11,8,14,17,20,4,12,4,5,17,19],'C':['Y','Y','Y','Y','N','N','N','Y','N','Y','N','N','Y','Y']})
    #fit and transform the training data and use them for the model training
    df[['A','B']] = min_max_scaler.fit_transform(df[['A','B']])
    df['C'] = df['C'].apply(lambda x: 0 if x.strip()=='N' else 1)
    
    #fit the model
    model.fit(df['A','B'])
    
    #after the model training on the transformed training data define the testing data df_test
    df_test = pd.DataFrame({'A':[25,67,24,76,23],'B':[2,54,22,75,19]})
    
    #before the prediction of the test data, ONLY APPLY the scaler on them
    df_test[['A','B']] = min_max_scaler.transform(df_test[['A','B']])
    
    #test the model
    y_predicted_from_model = model.predict(df_test['A','B'])
    

    Example using iris data:

    import matplotlib.pyplot as plt
    from sklearn import datasets
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import MinMaxScaler
    from sklearn.svm import SVC
    
    data = datasets.load_iris()
    X = data.data
    y = data.target
    
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
    
    scaler = MinMaxScaler()
    X_train_scaled = scaler.fit_transform(X_train)
    
    model = SVC()
    model.fit(X_train_scaled, y_train)
    
    X_test_scaled = scaler.transform(X_test)
    y_pred = model.predict(X_test_scaled)
    

    Hope this helps.

    See also by post here: https://towardsdatascience.com/everything-you-need-to-know-about-min-max-normalization-in-python-b79592732b79

    0 讨论(0)
提交回复
热议问题