Ensemble of different kinds of regressors using scikit-learn (or any other python framework)

后端 未结 4 1740
猫巷女王i
猫巷女王i 2021-01-30 05:43

I am trying to solve the regression task. I found out that 3 models are working nicely for different subsets of data: LassoLARS, SVR and Gradient Tree Boosting. I noticed that w

相关标签:
4条回答
  • 2021-01-30 06:09

    Ok, after spending some time on googling 'stacking' (as mentioned by @andreas earlier) I found out how I could do the weighting in python even with scikit-learn. Consider the below:

    I train a set of my regression models (as mentioned SVR, LassoLars and GradientBoostingRegressor). Then I run all of them on training data (same data which was used for training of each of these 3 regressors). I get predictions for examples with each of my algorithms and save these 3 results into pandas dataframe with columns 'predictedSVR', 'predictedLASSO' and 'predictedGBR'. And I add the final column into this datafrane which I call 'predicted' which is a real prediction value.

    Then I just train a linear regression on this new dataframe:

    #df - dataframe with results of 3 regressors and true output
    from sklearn linear_model
    stacker= linear_model.LinearRegression()
    stacker.fit(df[['predictedSVR', 'predictedLASSO', 'predictedGBR']], df['predicted'])
    

    So when I want to make a prediction for new example I just run each of my 3 regressors separately and then I do:

    stacker.predict() 
    

    on outputs of my 3 regressors. And get a result.

    The problem here is that I am finding optimal weights for regressors 'on average, the weights will be same for each example on which I will try to make prediction.

    0 讨论(0)
  • 2021-01-30 06:22

    Late response, but I wanted to add one practical point for this sort of stacked regression approach (which I use this frequently in my work).

    You may want to choose an algorithm for the stacker which allows positive=True (for example, ElasticNet). I have found that, when you have one relatively stronger model, the unconstrained LinearRegression() model will often fit a larger positive coefficient to the stronger and a negative coefficient to the weaker model.

    Unless you actually believe that your weaker model has negative predictive power, this is not a helpful outcome. Very similar to having high multi-colinearity between features of a regular regression model. Causes all sorts of edge effects.

    This comment applies most significantly to noisy data situations. If you're aiming to get RSQ of 0.9-0.95-0.99, you'd probably want to throw out the model which was getting a negative weighting.

    0 讨论(0)
  • 2021-01-30 06:28

    This is a known interesting (and often painful!) problem with hierarchical predictions. A problem with training a number of predictors over the train data, then training a higher predictor over them, again using the train data - has to do with the bias-variance decomposition.

    Suppose you have two predictors, one essentially an overfitting version of the other, then the former will appear over the train set to be better than latter. The combining predictor will favor the former for no true reason, just because it cannot distinguish overfitting from true high-quality prediction.

    The known way of dealing with this is to prepare, for each row in the train data, for each of the predictors, a prediction for the row, based on a model not fit for this row. For the overfitting version, e.g., this won't produce a good result for the row, on average. The combining predictor will then be able to better assess a fair model for combining the lower-level predictors.

    Shahar Azulay & I wrote a transformer stage for dealing with this:

    class Stacker(object):
        """
        A transformer applying fitting a predictor `pred` to data in a way
            that will allow a higher-up predictor to build a model utilizing both this 
            and other predictors correctly.
    
        The fit_transform(self, x, y) of this class will create a column matrix, whose 
            each row contains the prediction of `pred` fitted on other rows than this one. 
            This allows a higher-level predictor to correctly fit a model on this, and other
            column matrices obtained from other lower-level predictors.
    
        The fit(self, x, y) and transform(self, x_) methods, will fit `pred` on all 
            of `x`, and transform the output of `x_` (which is either `x` or not) using the fitted 
            `pred`.
    
        Arguments:    
            pred: A lower-level predictor to stack.
    
            cv_fn: Function taking `x`, and returning a cross-validation object. In `fit_transform`
                th train and test indices of the object will be iterated over. For each iteration, `pred` will
                be fitted to the `x` and `y` with rows corresponding to the
                train indices, and the test indices of the output will be obtained
                by predicting on the corresponding indices of `x`.
        """
        def __init__(self, pred, cv_fn=lambda x: sklearn.cross_validation.LeaveOneOut(x.shape[0])):
            self._pred, self._cv_fn  = pred, cv_fn
    
        def fit_transform(self, x, y):
            x_trans = self._train_transform(x, y)
    
            self.fit(x, y)
    
            return x_trans
    
        def fit(self, x, y):
            """
            Same signature as any sklearn transformer.
            """
            self._pred.fit(x, y)
    
            return self
    
        def transform(self, x):
            """
            Same signature as any sklearn transformer.
            """
            return self._test_transform(x)
    
        def _train_transform(self, x, y):
            x_trans = np.nan * np.ones((x.shape[0], 1))
    
            all_te = set()
            for tr, te in self._cv_fn(x):
                all_te = all_te | set(te)
                x_trans[te, 0] = self._pred.fit(x[tr, :], y[tr]).predict(x[te, :]) 
            if all_te != set(range(x.shape[0])):
                warnings.warn('Not all indices covered by Stacker', sklearn.exceptions.FitFailedWarning)
    
            return x_trans
    
        def _test_transform(self, x):
            return self._pred.predict(x)
    

    Here is an example of the improvement for the setting described in @MaximHaytovich's answer.

    First, some setup:

        from sklearn import linear_model
        from sklearn import cross_validation
        from sklearn import ensemble
        from sklearn import metrics
    
        y = np.random.randn(100)
        x0 = (y + 0.1 * np.random.randn(100)).reshape((100, 1)) 
        x1 = (y + 0.1 * np.random.randn(100)).reshape((100, 1)) 
        x = np.zeros((100, 2)) 
    

    Note that x0 and x1 are just noisy versions of y. We'll use the first 80 rows for train, and the last 20 for test.

    These are the two predictors: a higher-variance gradient booster, and a linear predictor:

        g = ensemble.GradientBoostingRegressor()
        l = linear_model.LinearRegression()
    

    Here is the methodology suggested in the answer:

        g.fit(x0[: 80, :], y[: 80])
        l.fit(x1[: 80, :], y[: 80])
    
        x[:, 0] = g.predict(x0)
        x[:, 1] = l.predict(x1)
    
        >>> metrics.r2_score(
            y[80: ],
            linear_model.LinearRegression().fit(x[: 80, :], y[: 80]).predict(x[80: , :]))
        0.940017788444
    

    Now, using stacking:

        x[: 80, 0] = Stacker(g).fit_transform(x0[: 80, :], y[: 80])[:, 0]
        x[: 80, 1] = Stacker(l).fit_transform(x1[: 80, :], y[: 80])[:, 0]
    
        u = linear_model.LinearRegression().fit(x[: 80, :], y[: 80])
    
        x[80: , 0] = Stacker(g).fit(x0[: 80, :], y[: 80]).transform(x0[80:, :])
        x[80: , 1] = Stacker(l).fit(x1[: 80, :], y[: 80]).transform(x1[80:, :])
    
        >>> metrics.r2_score(
            y[80: ],
            u.predict(x[80:, :]))
        0.992196564279
    

    The stacking prediction does better. It realizes that the gradient booster is not that great.

    0 讨论(0)
  • 2021-01-30 06:31

    What you describe is called "stacking" which is not implemented in scikit-learn yet, but I think contributions would be welcome. An ensemble that just averages will be in pretty soon: https://github.com/scikit-learn/scikit-learn/pull/4161

    0 讨论(0)
提交回复
热议问题