How to reverse Label Encoder from sklearn for multiple columns?

前端 未结 2 1756
再見小時候
再見小時候 2021-01-07 08:49

I would like to use the inverse_transform function for LabelEncoder on multiple columns.

This is the code I use for more than one columns when applying LabelEncoder

相关标签:
2条回答
  • 2021-01-07 09:08

    In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoders in a dict inside your object. The way it would work:

    • when you call fit the encoders for every column are fit and saved
    • when you call transform they get used to transform data
    • when you call inverse_transform they get used to do the inverse transformation

    Example code:

    class MultiColumnLabelEncoder:
    
        def __init__(self, columns=None):
            self.columns = columns # array of column names to encode
    
    
        def fit(self, X, y=None):
            self.encoders = {}
            columns = X.columns if self.columns is None else self.columns
            for col in columns:
                self.encoders[col] = LabelEncoder().fit(X[col])
            return self
    
    
        def transform(self, X):
            output = X.copy()
            columns = X.columns if self.columns is None else self.columns
            for col in columns:
                output[col] = self.encoders[col].transform(X[col])
            return output
    
    
        def fit_transform(self, X, y=None):
            return self.fit(X,y).transform(X)
    
    
        def inverse_transform(self, X):
            output = X.copy()
            columns = X.columns if self.columns is None else self.columns
            for col in columns:
                output[col] = self.encoders[col].inverse_transform(X[col])
            return output
    

    You can then use it like this:

    multi = MultiColumnLabelEncoder(columns=['city','size'])
    df = pd.DataFrame({'city':    ['London','Paris','Moscow'],
                       'size':    ['M',     'M',    'L'],
                       'quantity':[12,       1,      4]})
    X = multi.fit_transform(df)
    print(X)
    #    city  size  quantity
    # 0     0     1        12
    # 1     2     1         1
    # 2     1     0         4
    inv = multi.inverse_transform(X)
    print(inv)
    #      city size  quantity
    # 0  London    M        12
    # 1   Paris    M         1
    # 2  Moscow    L         4
    

    There could be a separate implementation of fit_transform that would call the same method of LabelEncoders. Just make sure to keep the encoders around for when you need the inverse transformation.

    0 讨论(0)
  • 2021-01-07 09:12

    You do not need to modify it this way. It's already implemented as a method inverse_transform.

    Example:

    from sklearn import preprocessing
    
    le = preprocessing.LabelEncoder()
    df = ["paris", "paris", "tokyo", "amsterdam"]
    
    le_fitted = le.fit_transform(df)
    
    inverted = le.inverse_transform(le_fitted)
    
    print(inverted)
    # array(['paris', 'paris', 'tokyo', 'amsterdam'], dtype='|S9')
    
    0 讨论(0)
提交回复
热议问题