How to get probability from GLM output

前端 未结 2 1448
深忆病人
深忆病人 2021-01-15 14:51

I\'m extremely stuck at the moment as I am trying to figure out how to calculate the probability from my glm output in R. I know the data is very insignificant

2条回答
  •  时光说笑
    2021-01-15 15:22

    The dependent variable in a logistic regression is a log odds ratio. We'll illustrate how to interpret the coefficients with the space shuttle autolander data from the MASS package.

    After loading the data, we'll create a binary dependent variable where:

    1 = autolander used, 
    0 = autolander not used. 
    

    We will also create a binary independent variable for shuttle stability:

    1 = stable positioning
    0 = unstable positioning. 
    

    Then, we'll run glm() with family=binomial(link="logit"). Since the coefficients are log odds ratios, we'll exponentiate them to turn them back into odds ratios.

    library(MASS)
    str(shuttle)
    shuttle$stable <- 0
    shuttle[shuttle$stability =="stab","stable"] <- 1
    shuttle$auto <- 0
    shuttle[shuttle$use =="auto","auto"] <- 1
    
    fit <- glm(use ~ factor(stable),family=binomial(link = "logit"),data=shuttle) # specifies base as unstable
    
    summary(fit)
    exp(fit$coefficients)
    

    ...and the output:

    > fit <- glm(use ~ factor(stable),family=binomial(link = "logit"),data=shuttle) # specifies base as unstable
    > 
    > summary(fit)
    
    Call:
    glm(formula = use ~ factor(stable), family = binomial(link = "logit"), 
    data = shuttle)
    
    Deviance Residuals: 
        Min       1Q   Median       3Q      Max  
    -1.1774  -1.0118  -0.9566   1.1774   1.4155  
    
    Coefficients:
                      Estimate Std. Error z value Pr(>|z|)  
    (Intercept)      4.747e-15  1.768e-01   0.000   1.0000  
    factor(stable)1 -5.443e-01  2.547e-01  -2.137   0.0326 *
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for binomial family taken to be 1)
    
        Null deviance: 350.36  on 255  degrees of freedom
    Residual deviance: 345.75  on 254  degrees of freedom
    AIC: 349.75
    
    Number of Fisher Scoring iterations: 4
    
    > exp(fit$coefficients)
        (Intercept) factor(stable)1 
          1.0000000       0.5802469 
    > 
    

    The intercept of 0 is the log odds for unstable, and the coefficient of -.5443 is the log odds for stable. After exponentiating the coefficients, we observe that the odds of autolander use under the condition of an unstable shuttle 1.0, and are multiplied by .58 if the shuttle is stable. This means that the autolander is less likely to be used if the shuttle has stable positioning.

    Calculating probability of autolander use

    We can do this in two ways. First, the manual approach: exponentiate the coefficients and convert the odds to probabilities using the following equation.

    p = odds / (1 + odds) 
    

    With the shuttle autolander data it works as follows.

    # convert intercept to probability
    odds_i <- exp(fit$coefficients[1])
    odds_i / (1 + odds_i)
    # convert stable="stable" to probability
    odds_p <- exp(fit$coefficients[1]) * exp(fit$coefficients[2])
    odds_p / (1 + odds_p)
    

    ...and the output:

    > # convert intercept to probability
    > odds_i <- exp(fit$coefficients[1])
    > odds_i / (1 + odds_i)
    (Intercept) 
            0.5 
    > # convert stable="stable" to probability
    > odds_p <- exp(fit$coefficients[1]) * exp(fit$coefficients[2])
    > odds_p / (1 + odds_p)
    (Intercept) 
      0.3671875 
    >
    

    The probability of autolander use when a shuttle is unstable is 0.5, and decreases to 0.37 when the shuttle is stable.

    The second approach to generate probabilities is to use the predict() function.

    # convert to probabilities with the predict() function
    predict(fit,data.frame(stable="0"),type="response")
    predict(fit,data.frame(stable="1"),type="response")
    

    Note that the output matches the manually calculated probabilities.

    > # convert to probabilities with the predict() function
    > predict(fit,data.frame(stable="0"),type="response")
      1 
    0.5 
    > predict(fit,data.frame(stable="1"),type="response")
            1 
    0.3671875 
    > 
    

    Applying this to the OP data

    We can apply these steps to the glm() output from the OP as follows.

    coefficients <- c(-1.1455,-0.1322)
    exp(coefficients)
    odds_i <- exp(coefficients[1])
    odds_i / (1 + odds_i)
    # convert nonRSEvents = 1 to probability
    odds_p <- exp(coefficients[1]) * exp(coefficients[2])
    odds_p / (1 + odds_p)
    # simulate up to 10 nonRSEvents prior to RS
    coef_df <- data.frame(nonRSEvents=0:10,
                      intercept=rep(-1.1455,11),
                      nonRSEventSlope=rep(-0.1322,11))
    coef_df$nonRSEventValue <- coef_df$nonRSEventSlope * 
    coef_df$nonRSEvents
    coef_df$intercept_exp <- exp(coef_df$intercept)
    coef_df$slope_exp <- exp(coef_df$nonRSEventValue)
    coef_df$odds <- coef_df$intercept_exp * coef_df$slope_exp
    coef_df$probability <- coef_df$odds / (1 + coef_df$odds)
    # print the odds & probabilities by number of nonRSEvents
    coef_df[,c(1,7:8)]
    

    ...and the final output.

    > coef_df[,c(1,7:8)]
       nonRSEvents    odds probability
    1            0 0.31806     0.24131
    2            1 0.27868     0.21794
    3            2 0.24417     0.19625
    4            3 0.21393     0.17623
    5            4 0.18744     0.15785
    6            5 0.16423     0.14106
    7            6 0.14389     0.12579
    8            7 0.12607     0.11196
    9            8 0.11046     0.09947
    10           9 0.09678     0.08824
    11          10 0.08480     0.07817
    > 
    

提交回复
热议问题