Calculating R^2 for a nonlinear least squares fit

前端 未结 5 1798
别跟我提以往
别跟我提以往 2020-12-14 02:36

Suppose I have x values, y values, and expected y values f (from some nonlinear best fit curve).

How can I compute R^2 in R? N

相关标签:
5条回答
  • As a direct answer to the question asked (rather than argue that R2/pseudo R2 aren't useful) the nagelkerke function in the rcompanion package will report various pseudo R2 values for nonlinear least square (nls) models as proposed by McFadden, Cox and Snell, and Nagelkerke, e.g.

    require(nls)
    data(BrendonSmall)
    quadplat = function(x, a, b, clx) {
              ifelse(x  < clx, a + b * x   + (-0.5*b/clx) * x   * x,
                               a + b * clx + (-0.5*b/clx) * clx * clx)}
    model = nls(Sodium ~ quadplat(Calories, a, b, clx),
                data = BrendonSmall,
                start = list(a   = 519,
                             b   = 0.359,
                             clx = 2304))
    nullfunct = function(x, m){m}
    null.model = nls(Sodium ~ nullfunct(Calories, m),
                 data = BrendonSmall,
                 start = list(m   = 1346))
    nagelkerke(model, null=null.model)
    

    The soilphysics package also reports Efron's pseudo R2 and adjusted pseudo R2 value for nls models as 1 - RSS/TSS:

    pred <- predict(model)
    n <- length(pred)
    res <- resid(model)
    w <- weights(model)
    if (is.null(w)) w <- rep(1, n)
    rss <- sum(w * res ^ 2)
    resp <- pred + res
    center <- weighted.mean(resp, w)
    r.df <- summary(model)$df[2]
    int.df <- 1
    tss <- sum(w * (resp - center)^2)
    r.sq <- 1 - rss/tss
    adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
    out <- list(pseudo.R.squared = r.sq,
                adj.R.squared = adj.r.sq)
    

    which is also the pseudo R2 as calculated by the accuracy function in the rcompanion package. Basically, this R2 measures how much better your fit becomes compared to if you would just draw a flat horizontal line through them. This can make sense for nls models if your null model is one that allows for an intercept only model. Also for particular other nonlinear models it can make sense. E.g. for a scam model that uses stricly increasing splines (bs="mpi" in the spline term), the fitted model for the worst possible scenario (e.g. where your data was strictly decreasing) would be a flat line, and hence would result in an R2 of zero. Adjusted R2 then also penalize models with higher nrs of fitted parameters. Using the adjusted R2 value would already address a lot of the criticisms of the paper linked above, http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2892436/ (besides if one swears by using information criteria to do model selection the question becomes which one to use - AIC, BIC, EBIC, AICc, QIC, etc).

    Just using

    r.sq <- max(cor(y,yfitted),0)^2
    adj.r.sq <- 1 - (1 - r.sq) * (n - int.df) / r.df
    

    I think would also make sense if you have normal Gaussian errors - i.e. the correlation between the observed and fitted y (clipped at zero, so that a negative relationship would imply zero predictive power) squared, and then adjusted for the nr of fitted parameters in the adjusted version. If y and yfitted go in the same direction this would be the R2 and adjusted R2 value as reported for a regular linear model. To me this would make perfect sense at least, so I don't agree with outright rejecting the usefulness of pseudo R2 values for nls models as the answer above seems to imply.

    For non-normal error structures (e.g. if you were using a GAM with non-normal errors) the McFadden pseudo R2 is defined analogously as

    1-residual deviance/null deviance
    

    See here and here for some useful discussion.

    0 讨论(0)
  • 2020-12-14 03:23

    You just use the lm function to fit a linear model:

    x = runif(100)
    y = runif(100)
    spam = summary(lm(x~y))
    > spam$r.squared
    [1] 0.0008532386
    

    Note that the r squared is not defined for non-linear models, or at least very tricky, quote from R-help:

    There is a good reason that an nls model fit in R does not provide r-squared - r-squared doesn't make sense for a general nls model.

    One way of thinking of r-squared is as a comparison of the residual sum of squares for the fitted model to the residual sum of squares for a trivial model that consists of a constant only. You cannot guarantee that this is a comparison of nested models when dealing with an nls model. If the models aren't nested this comparison is not terribly meaningful.

    So the answer is that you probably don't want to do this in the first place.

    If you want peer-reviewed evidence, see this article for example; it's not that you can't compute the R^2 value, it's just that it may not mean the same thing/have the same desirable properties as in the linear-model case.

    0 讨论(0)
  • 2020-12-14 03:23

    Sounds like f are your predicted values. So the distance from them to the actual values devided by n * variance of y

    so something like

    1-sum((y-f)^2)/(length(y)*var(y))

    should give you a quasi rsquared value, so long as your model is reasonably close to a linear model and n is pretty big.

    0 讨论(0)
  • 2020-12-14 03:25

    Another quasi-R-squared for non-linear models is to square the correlation between the actual y-values and the predicted y-values. For linear models this is the regular R-squared.

    0 讨论(0)
  • 2020-12-14 03:37

    As an alternative to this problem I used at several times the following procedure:

    1. compute a fit on data with the nls function
    2. using the resulting model make predictions
    3. Trace (plot...) the data against the values predicted by the model (if the model is good, points should be near the bissectrix).
    4. Compute the R2 of the linear régression.

    Best wishes to all. Patrick.

    0 讨论(0)
提交回复
热议问题