I would like to use the inverse_transform function for LabelEncoder on multiple columns.
This is the code I use for more than one columns when applying LabelEncoder
In order to inverse transform the data you need to remember the encoders that were used to transform every column. A possible way to do this is to save the LabelEncoder
s in a dict inside your object. The way it would work:
fit
the encoders for every column are fit and savedtransform
they get used to transform datainverse_transform
they get used to do the inverse transformationExample code:
class MultiColumnLabelEncoder:
def __init__(self, columns=None):
self.columns = columns # array of column names to encode
def fit(self, X, y=None):
self.encoders = {}
columns = X.columns if self.columns is None else self.columns
for col in columns:
self.encoders[col] = LabelEncoder().fit(X[col])
return self
def transform(self, X):
output = X.copy()
columns = X.columns if self.columns is None else self.columns
for col in columns:
output[col] = self.encoders[col].transform(X[col])
return output
def fit_transform(self, X, y=None):
return self.fit(X,y).transform(X)
def inverse_transform(self, X):
output = X.copy()
columns = X.columns if self.columns is None else self.columns
for col in columns:
output[col] = self.encoders[col].inverse_transform(X[col])
return output
You can then use it like this:
multi = MultiColumnLabelEncoder(columns=['city','size'])
df = pd.DataFrame({'city': ['London','Paris','Moscow'],
'size': ['M', 'M', 'L'],
'quantity':[12, 1, 4]})
X = multi.fit_transform(df)
print(X)
# city size quantity
# 0 0 1 12
# 1 2 1 1
# 2 1 0 4
inv = multi.inverse_transform(X)
print(inv)
# city size quantity
# 0 London M 12
# 1 Paris M 1
# 2 Moscow L 4
There could be a separate implementation of fit_transform
that would call the same method of LabelEncoder
s. Just make sure to keep the encoders around for when you need the inverse transformation.
You do not need to modify it this way. It's already implemented as a method inverse_transform
.
Example:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
df = ["paris", "paris", "tokyo", "amsterdam"]
le_fitted = le.fit_transform(df)
inverted = le.inverse_transform(le_fitted)
print(inverted)
# array(['paris', 'paris', 'tokyo', 'amsterdam'], dtype='|S9')