I try to build a pipeline with variable transformation And i do as below
import numpy as np
import pandas as pd
import sklearn
from sklearn import linear_mod
So the problem is that transform
expects an argument of array of shape [n_samples, n_features]
See the Examples section in the documentation of sklearn.pipeline.Pipeline, it uses sklearn.feature_selection.SelectKBest
as a transform, and you can see its source that it expects X
to be an array instead of separate variables like X1
and X2
.
In short, your code can be fixed like this:
import pandas as pd
import sklearn
from sklearn import linear_model
from sklearn.pipeline import Pipeline
df = pd.DataFrame({'y': [4,5,6], 'a':[3,2,3], 'b' : [2,3,4]})
class Complex():
def transform(self, Xt):
return pd.DataFrame(Xt['a'] - Xt['b'])
def fit_transform(self, X1, X2):
return self.transform(X1)
X = df[['a', 'b']]
y = df['y']
regressor = linear_model.SGDRegressor()
pipeline = Pipeline([
('transform', Complex()) ,
('model_fitting', regressor)
])
pipeline.fit(X, y)
pred = pipeline.predict(X)
print(pred)