Dummy creation in pipeline with different levels in train and test set

后端未结

关注

 1  1308

I\'m currently exploring the scikit learn pipelines. I also want to preprocess the data with a pipeline. However, my train and test data have different levels of the categor

相关标签:

1条回答

爱一瞬间的悲伤

2020-12-31 10:55

You can use categoricals as explained in this answer:

categories = np.union1d(train, test)
train = train.astype('category', categories=categories)
test = test.astype('category', categories=categories)

pd.get_dummies(train)
Out: 
   a  b  c  d
0  1  0  0  0
1  0  1  0  0
2  0  1  0  0
3  1  0  0  0
4  1  0  0  0

pd.get_dummies(test)
Out: 
   a  b  c  d
0  1  0  0  0
1  0  1  0  0
2  0  0  1  0
3  0  0  0  1

0 讨论(0)