Create a table with values from ecdf graph

后端 未结 2 1268
-上瘾入骨i
-上瘾入骨i 2021-01-26 09:06

I am trying to create a table using values from an ecdf plot. I\'ve recreated an example below.

#Data
data(mtcars)

#Sort by mpg
mtcars <- mtcars[order(mtcars$mpg),]         


        
相关标签:
2条回答
  • 2021-01-26 09:57

    A much shorter answer that I can't believe I didn't see earlier. Essentially I just divide the number of rows equal to or less than .25, .5, and .75 by the total number of rows, for each cyl.

    cyl.table<-mtcars %>%
      group_by(cyl) %>%
        summarise("25% Picked" = sum(Percent_Picked<=0.25)/(sum(Percent_Picked<=1)),
                  "50% Picked" = sum(Percent_Picked<=0.5)/(sum(Percent_Picked<=1)),
                  "75% Picked" = sum(Percent_Picked<=0.75)/(sum(Percent_Picked<=1)))
    cyl.table
    
    0 讨论(0)
  • 2021-01-26 10:06

    So looking around I found this question. Yours extends this a little by asking for group specific ecdf values, so we can use the do function in dplyr (here's an example] to do so. There's some slight differences in the values when comparing between this table and the values in your ggplot and I'm not exactly sure why that is. It could be just that the mtcars data set is somewhat small, so if you run this on a larger data set, I'd expect it to be closer to the actual values.

    
    #Sort by mpg
    mtcars <- mtcars[order(mtcars$mpg),]
    
    #Make arbitrary ranking variable based on mpg
    mtcars <- mtcars %>% mutate(Rank = dense_rank(mpg))
    
    #Make variable for percent picked
    mtcars <- mutate(mtcars, Percent_Picked = Rank/max(mtcars$Rank))
    
    #Make cyl categorical
    mtcars$cyl<-cut(mtcars$cyl, c(3,5,7,9), right=FALSE, labels=c(4,6,8))
    
    #Make the graph
    ggplot(mtcars, aes(Percent_Picked, color = cyl)) + 
      stat_ecdf(size=1) + 
      scale_x_continuous(labels = scales::percent) +
      scale_y_continuous(labels = scales::percent)
    
    
    create_ecdf_vals <- function(vec){
      df <- data.frame(
        x = unique(vec),
        y = ecdf(vec)(unique(vec))*length(vec)
      ) %>%
        mutate(y = scale(y, center = min(y), scale = diff(range(y)))) %>%
        union_all(data.frame(x=c(0,1),
                             y=c(0,1))) # adding in max/mins
      return(df)
    }
    
    mt.ecdf <- mtcars %>%
      group_by(cyl) %>%
      do(create_ecdf_vals(.$Percent_Picked))
    
    
    mt.ecdf %>%
      summarise(q25 = y[which.max(x[x<=0.25])],
                q50 = y[which.max(x[x<=0.5])],
                q75 = y[which.max(x[x<=0.75])])
    
    ggplot(mt.ecdf,aes(x,y,color = cyl)) +
      geom_step()
    

    ~EDIT~
    After some digging around in the ggplot2 docs, we can actually explicitly pull out the data from the plot using the layer_data function.

    my.plt <- ggplot(mtcars, aes(Percent_Picked, color = cyl)) + 
      stat_ecdf(size=1) + 
      scale_x_continuous(labels = scales::percent) +
      scale_y_continuous(labels = scales::percent)
    
    plt.data <- layer_data(my.plt) # magic happens here
    
    # and here's the table you want
    plt.data %>%
      group_by(group) %>%
      summarise(q25 = y[which.max(x[x<=0.25])],
                q50 = y[which.max(x[x<=0.5])],
                q75 = y[which.max(x[x<=0.75])])
    
    0 讨论(0)
提交回复
热议问题