Pipeline doesn't work with Label Encoder

偶尔善良 提交于 2020-01-15 09:23:07

问题


I do as below

import pandas as pd
from sklearn import preprocessing
import sklearn
from sklearn.pipeline import Pipeline
df = pd.DataFrame({'c':['a', 'b', 'c']*4, 'd': ['m', 'f']*6})
encoding_pipeline =Pipeline([
                ('LabelEncoder', preprocessing.LabelEncoder())            
                        ])
encoding_pipeline.fit_transform(df)

and full Traceback

TypeError                                 Traceback (most recent call last)
<ipython-input-7-0882633ccf59> in <module>()
----> 1 encoding_pipeline.fit_transform(df)

C:\Program Files\Anaconda3\lib\site-packages\sklearn\pipeline.py in fit_transform(self, X, y, **fit_params)
    183         Xt, fit_params = self._pre_transform(X, y, **fit_params)
    184         if hasattr(self.steps[-1][-1], 'fit_transform'):
--> 185             return self.steps[-1][-1].fit_transform(Xt, y, **fit_params)
    186         else:
    187             return self.steps[-1][-1].fit(Xt, y, **fit_params).transform(Xt)

TypeError: fit_transform() takes 2 positional arguments but 3 were given

what's wrong? It looks like i have to convert a dataframe before i apply the pipeline


回答1:


Just a simple version

import pandas as pd
from sklearn import preprocessing
import sklearn
from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion
df = pd.DataFrame({'c':['a', 'b', 'c']*4, 'd': ['m', 'f']*6})

Define how to select a variable

class ItemSelector():
    def __init__(self, key):
        self.key = key

    def fit(self, x, y=None):
        return self

    def transform(self, data_dict):
        return data_dict[self.key]

Now class for encoder

class MyLEncoder():

    def transform(self, X, y=None, **fit_params):
        enc = preprocessing.LabelEncoder()
        encc = enc.fit(X)
        enc_data = enc.transform(X)

        return enc_data

    def fit_transform(self, X, y=None, **fit_params):
        self.fit(X, y, **fit_params)
        return self.transform(X)

    def fit(self, X, y=None, **fit_params):
        return self

and pipeline

encoding_pipeline =Pipeline([
         ('union', FeatureUnion(
        transformer_list=[ 
         ('categorical', Pipeline([
                                 ('selector', ItemSelector(key='c')),

                                ('LabelEncoder', MyLEncoder()) ]))                              

]))
                     ])

and

X = df
encoding_pipeline.fit_transform(X)
array([0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2], dtype=int64)

If you need to use with algorithm yuo need more details



来源:https://stackoverflow.com/questions/40097177/pipeline-doesnt-work-with-label-encoder

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!