using dplyr's do() with summary()

后端 未结 3 1740
清酒与你
清酒与你 2021-01-05 20:28

I would like to be able to use dplyr\'s split-apply-combine strategy to the apply the summary() command.

Take a simple data frame:

相关标签:
3条回答
  • 2021-01-05 20:54

    The problem is that dplyr's do() only works with with input of the form data.frame.

    The broom package's tidy() function can be used to convert outputs of summary() to data.frame.

    df %>%
      group_by(class) %>%
      do( tidy(summary(.$value)) )
    

    This gives:

    Source: local data frame [2 x 7]
    Groups: class [2]
    
       class minimum    q1 median  mean    q3 maximum
      (fctr)   (dbl) (dbl)  (dbl) (dbl) (dbl)   (dbl)
    1      A     100   105    110   110   115     120
    2      B     800   820    840   840   860     880
    
    0 讨论(0)
  • 2021-01-05 21:09

    You can use the SE version of data_frame, that is, data_frame_ and perform:

    df %>%
      group_by(class) %>%
      do(data_frame_(summary(.$value)))
    

    Alternatively, you can use as.list() wrapped by data.frame() with the argument check.names = FALSE:

    df %>%
      group_by(class) %>%
      do(data.frame(as.list(summary(.$value)), check.names = FALSE))
    

    Both versions produce:

    # Source: local data frame [2 x 7]
    # Groups: class [2]
    # 
    #    class  Min. 1st Qu. Median  Mean 3rd Qu.  Max.
    #   (fctr) (dbl)   (dbl)  (dbl) (dbl)   (dbl) (dbl)
    # 1      A   100     105    110   110     115   120
    # 2      B   800     820    840   840     860   880
    
    0 讨论(0)
  • 2021-01-05 21:10

    The behavior of do will change depending on whether you give it a named or unnamed argument. For unnamed arguments, it expects a data.frame for each group, which will be binded together. For named arguments it will make a row for each group, and put whatever the output is into a new variable with that name.

    So in this case we it will complain for unnamed use (summary does not produce a data.frame) but the named use will work:

    df %>%
      group_by(class) %>%
      do(summaries = summary(.$value)) ->
      df2
    

    Which gives:

    Source: local data frame [2 x 2]
    Groups: <by row>
    
       class                  summaries
      (fctr)                      (chr)
    1      A <S3:summaryDefault, table>
    2      B <S3:summaryDefault, table>
    

    We can access a summary like this:

    df2$summaries[[1]]
    

    Giving:

    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    100     105     110     110     115     120 
    

    Getting all of these as new columns for df can only be done by first converting the output to a data.frame, as can be seen in the other answers.

    So the root of the problem here is that summary outputs a table instead of a data.frame.

    0 讨论(0)
提交回复
热议问题