Scikit-learn is returning coefficient of determination (R^2) values less than -1

前端 未结 4 945
陌清茗
陌清茗 2021-01-30 18:12

I\'m doing a simple linear model. I have

fire = load_data()
regr = linear_model.LinearRegression()
scores = cross_validation.cross_val_score(regr, fire.data, fir         


        
4条回答
  •  一生所求
    2021-01-30 18:29

    Just because R^2 can be negative does not mean it should be.

    Possibility 1: a bug in your code.

    A common bug that you should double check is that you are passing in parameters correctly:

    r2_score(y_true, y_pred) # Correct!
    r2_score(y_pred, y_true) # Incorrect!!!!
    

    Possibility 2: small datasets

    If you are getting a negative R^2, you could also check for over fitting. Keep in mind that cross_validation.cross_val_score() does not randomly shuffle your inputs, so if your sample are inadvertently sorted (by date for example) then you might build models on each fold that are not predictive for the other folds.

    Try reducing the number of features, increasing the number samples, and decreasing the number of folds (if you are using cross_validation). While there is no official rule here, your m x n dataset (where m is the number of samples and n is the number of features) should be of a shape where

    m > n^2
    

    and when you using cross validation with f as the number of folds, you should aim for

    m/f > n^2
    

提交回复
热议问题