Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

后端 未结 4 1742
猫巷女王i
猫巷女王i 2021-01-30 05:43

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that w

4条回答
  •  庸人自扰
    2021-01-30 06:28

    This is a known interesting (and often painful!) problem with hierarchical predictions. A problem with training a number of predictors over the train data, then training a higher predictor over them, again using the train data - has to do with the bias-variance decomposition.

    Suppose you have two predictors, one essentially an overfitting version of the other, then the former will appear over the train set to be better than latter. The combining predictor will favor the former for no true reason, just because it cannot distinguish overfitting from true high-quality prediction.

    The known way of dealing with this is to prepare, for each row in the train data, for each of the predictors, a prediction for the row, based on a model not fit for this row. For the overfitting version, e.g., this won't produce a good result for the row, on average. The combining predictor will then be able to better assess a fair model for combining the lower-level predictors.

    Shahar Azulay & I wrote a transformer stage for dealing with this:

    class Stacker(object):
        """
        A transformer applying fitting a predictor `pred` to data in a way
            that will allow a higher-up predictor to build a model utilizing both this 
            and other predictors correctly.
    
        The fit_transform(self, x, y) of this class will create a column matrix, whose 
            each row contains the prediction of `pred` fitted on other rows than this one. 
            This allows a higher-level predictor to correctly fit a model on this, and other
            column matrices obtained from other lower-level predictors.
    
        The fit(self, x, y) and transform(self, x_) methods, will fit `pred` on all 
            of `x`, and transform the output of `x_` (which is either `x` or not) using the fitted 
            `pred`.
    
        Arguments:    
            pred: A lower-level predictor to stack.
    
            cv_fn: Function taking `x`, and returning a cross-validation object. In `fit_transform`
                th train and test indices of the object will be iterated over. For each iteration, `pred` will
                be fitted to the `x` and `y` with rows corresponding to the
                train indices, and the test indices of the output will be obtained
                by predicting on the corresponding indices of `x`.
        """
        def __init__(self, pred, cv_fn=lambda x: sklearn.cross_validation.LeaveOneOut(x.shape[0])):
            self._pred, self._cv_fn  = pred, cv_fn
    
        def fit_transform(self, x, y):
            x_trans = self._train_transform(x, y)
    
            self.fit(x, y)
    
            return x_trans
    
        def fit(self, x, y):
            """
            Same signature as any sklearn transformer.
            """
            self._pred.fit(x, y)
    
            return self
    
        def transform(self, x):
            """
            Same signature as any sklearn transformer.
            """
            return self._test_transform(x)
    
        def _train_transform(self, x, y):
            x_trans = np.nan * np.ones((x.shape[0], 1))
    
            all_te = set()
            for tr, te in self._cv_fn(x):
                all_te = all_te | set(te)
                x_trans[te, 0] = self._pred.fit(x[tr, :], y[tr]).predict(x[te, :]) 
            if all_te != set(range(x.shape[0])):
                warnings.warn('Not all indices covered by Stacker', sklearn.exceptions.FitFailedWarning)
    
            return x_trans
    
        def _test_transform(self, x):
            return self._pred.predict(x)
    

    Here is an example of the improvement for the setting described in @MaximHaytovich's answer.

    First, some setup:

        from sklearn import linear_model
        from sklearn import cross_validation
        from sklearn import ensemble
        from sklearn import metrics
    
        y = np.random.randn(100)
        x0 = (y + 0.1 * np.random.randn(100)).reshape((100, 1)) 
        x1 = (y + 0.1 * np.random.randn(100)).reshape((100, 1)) 
        x = np.zeros((100, 2)) 
    

    Note that x0 and x1 are just noisy versions of y. We'll use the first 80 rows for train, and the last 20 for test.

    These are the two predictors: a higher-variance gradient booster, and a linear predictor:

        g = ensemble.GradientBoostingRegressor()
        l = linear_model.LinearRegression()
    

    Here is the methodology suggested in the answer:

        g.fit(x0[: 80, :], y[: 80])
        l.fit(x1[: 80, :], y[: 80])
    
        x[:, 0] = g.predict(x0)
        x[:, 1] = l.predict(x1)
    
        >>> metrics.r2_score(
            y[80: ],
            linear_model.LinearRegression().fit(x[: 80, :], y[: 80]).predict(x[80: , :]))
        0.940017788444
    

    Now, using stacking:

        x[: 80, 0] = Stacker(g).fit_transform(x0[: 80, :], y[: 80])[:, 0]
        x[: 80, 1] = Stacker(l).fit_transform(x1[: 80, :], y[: 80])[:, 0]
    
        u = linear_model.LinearRegression().fit(x[: 80, :], y[: 80])
    
        x[80: , 0] = Stacker(g).fit(x0[: 80, :], y[: 80]).transform(x0[80:, :])
        x[80: , 1] = Stacker(l).fit(x1[: 80, :], y[: 80]).transform(x1[80:, :])
    
        >>> metrics.r2_score(
            y[80: ],
            u.predict(x[80:, :]))
        0.992196564279
    

    The stacking prediction does better. It realizes that the gradient booster is not that great.

提交回复
热议问题