Let\'s say that I want to compare different dimensionality reduction approaches for a particular (supervised) dataset that consists of n>2 features via cross-validation and by u
I just want to post my solution for completeness, and maybe it is useful to one or the other:
class ColumnExtractor(object):
def transform(self, X):
cols = X[:,2:4] # column 3 and 4 are "extracted"
return cols
def fit(self, X, y=None):
return self
Then, it can be used in the Pipeline
like so:
clf = Pipeline(steps=[
('scaler', StandardScaler()),
('reduce_dim', ColumnExtractor()),
('classification', GaussianNB())
])
And for a more general solution ,if you want to select and stack multiple columns, you can basically use the following Class as follows:
import numpy as np
class ColumnExtractor(object):
def __init__(self, cols):
self.cols = cols
def transform(self, X):
col_list = []
for c in self.cols:
col_list.append(X[:, c:c+1])
return np.concatenate(col_list, axis=1)
def fit(self, X, y=None):
return self
clf = Pipeline(steps=[
('scaler', StandardScaler()),
('dim_red', ColumnExtractor(cols=(1,3))), # selects the second and 4th column
('classification', GaussianNB())
])