Here is my code
import pandas as pd
import numpy as np
from sklearn.preprocessing import OneHotEncoder
train = pd.DataFra
This option gives a solution when test set has unseen categorical value in train set. If you would put ‘steve stevenson’ in the test set it would not return an error, it would return column with all zeros.
train = pd.DataFrame({
'users':['John Johnson','John Smith','Mary Williams']
})
test = pd.DataFrame({
'users':['John Smith','Mary Williams', 'Steve Stevenson']
})
ohe = OneHotEncoder(sparse=False, handle_unknown = 'ignore')
ohe.fit(train)
test_transformed = ohe.transform(test)
print(test_transformed)
Solution to None problem would be to replace None values with some category, like ‘unknown’.
Hope this helps