Dummy creation in pipeline with different levels in train and test set

后端 未结 1 1307
小鲜肉
小鲜肉 2020-12-31 10:04

I\'m currently exploring the scikit learn pipelines. I also want to preprocess the data with a pipeline. However, my train and test data have different levels of the categor

相关标签:
1条回答
  • 2020-12-31 10:55

    You can use categoricals as explained in this answer:

    categories = np.union1d(train, test)
    train = train.astype('category', categories=categories)
    test = test.astype('category', categories=categories)
    
    pd.get_dummies(train)
    Out: 
       a  b  c  d
    0  1  0  0  0
    1  0  1  0  0
    2  0  1  0  0
    3  1  0  0  0
    4  1  0  0  0
    
    pd.get_dummies(test)
    Out: 
       a  b  c  d
    0  1  0  0  0
    1  0  1  0  0
    2  0  0  1  0
    3  0  0  0  1
    
    0 讨论(0)
自定义标题
段落格式
字体
字号
代码语言
提交回复
热议问题