NaN giving ValueError in OneHotEncoder in scikit-learn

后端 未结 1 1557
我寻月下人不归
我寻月下人不归 2021-01-22 16:57

Here is my code

import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder

train = pd.DataFra         


        
相关标签:
1条回答
  • 2021-01-22 17:39

    This option gives a solution when test set has unseen categorical value in train set. If you would put ‘steve stevenson’ in the test set it would not return an error, it would return column with all zeros.

        train = pd.DataFrame({
            'users':['John Johnson','John Smith','Mary Williams']
    })
    test = pd.DataFrame({
            'users':['John Smith','Mary Williams', 'Steve Stevenson']
    })
    
    ohe = OneHotEncoder(sparse=False, handle_unknown = 'ignore')
    ohe.fit(train)
    
    test_transformed = ohe.transform(test)
    print(test_transformed)
    

    Solution to None problem would be to replace None values with some category, like ‘unknown’.

    Hope this helps

    0 讨论(0)
提交回复
热议问题