Predicted values for logistic regression from glm and stat_smooth in ggplot2 are different

后端 未结 2 1910
抹茶落季
抹茶落季 2020-12-05 05:48

I\'m trying to make this logistic regression graph in ggplot2.

df <- structure(list(y = c(2L, 7L, 776L, 19L, 12L, 26L, 7L, 12L, 8L,
24L, 20L,          


        
相关标签:
2条回答
  • 2020-12-05 06:26

    Modify your LD.summary to include a new column with group (or appropriate label).

    LD.summary$group <- c('LD25','LD50','LD75')
    

    Then modify your geom_segment commands to have a col=LD.summary$group in it (and remove the colour="red"), which plots each segment in a different colour and adds a legend:

    geom_segment( aes(...,col=LD.summary$group) )
    

    Also, to avoid having to do the LD.summary$xxx all the time, feed in data=LD.summary to your geom_segment:

    geom_segment(data=LD.summary, aes(x=0, y=Pi,xend=LD, yend=Pi, colour=group) )
    

    As to why the graphs are not exactly the same, in the base R graph the x axis goes from ~20 onwards, whereas in ggplot it goes from zero onwards. This is because your second geom_segment starts at x=0. To fix you could change x=0 to x=min(df$x).

    To get your y axis label use + scale_y_continuous('Estimated probability').

    In summary:

    LD.summary$group <- c('LD25','LD50','LD75')
    p <- ggplot(data = df, aes(x = x, y = y/n)) +
                geom_point() +
                stat_smooth(method = "glm", family = "binomial") +
                scale_y_continuous('Estimated probability')    # <-- add y label
    p <- p + geom_segment(data=LD.summary, aes( # <-- data=Ld.summary
                                x = LD
                              , y = 0
                              , xend = LD
                              , yend = Pi
                              , col = group     # <- colours
                             )
                           )    
    p <- p + geom_segment(data=LD.summary, aes( # <-- data=Ld.summary
                                x = min(df$x)   # <-- don't plot all the way to x=0
                              , y = Pi
                              , xend = LD
                              , yend = Pi
                              , col = group     # <- colours
                             )
                           )
    print(p)
    

    which yields:

    enter image description here

    0 讨论(0)
  • 2020-12-05 06:32

    Just a couple of minor additions to @mathetmatical.coffee's answer. Typically, geom_smooth isn't supposed to replace actual modeling, which is why it can seem inconvenient at times when you want to use specific output you'd get from glm and such. But really, all we need to do is add the fitted values to our data frame:

    df$pred <- pi.hat
    LD.summary$group <- c('LD25','LD50','LD75')
    
    ggplot(df,aes(x = x, y = y/n)) + 
        geom_point() + 
        geom_line(aes(y = pred),colour = "black") + 
        geom_segment(data=LD.summary, aes(y = Pi,
                                          xend = LD,
                                          yend = Pi,
                                          col = group),x = -Inf,linetype = "dashed") + 
        geom_segment(data=LD.summary,aes(x = LD,
                                         xend = LD,
                                         yend = Pi,
                                         col = group),y = -Inf,linetype = "dashed")
    

    enter image description here

    The final little trick is the use of Inf and -Inf to get the dashed lines to extend all the way to the plot boundaries.

    The lesson here is that if all you want to do is add a smooth to a plot, and nothing else in the plot depends on it, use geom_smooth. If you want to refer to the output from the fitted model, its generally easier to fit the model outside ggplot and then plot.

    0 讨论(0)
提交回复
热议问题