summarize_all with “n()” function

♀尐吖头ヾ 提交于 2020-05-29 08:30:39

问题


I'm summarizing a data frame in dplyr with the summarize_all() function. If I do the following:

summarize_all(mydf, list(mean="mean", median="median", sd="sd"))

I get a tibble with 3 variables for each of my original measures, all suffixed by the type (mean, median, sd). Great! But when I try to capture the within-vector n's to calculate the standard deviations myself and to make sure missing cells aren't counted...

summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="n"))

...I get an error:

Error in (function ()  : unused argument (var_a)

This is not an issue with my var_a vector. If I remove it, I get the same error for var_b, etc. The summarize_all function is producing odd results whenever I request n or n(), or if I use .funs() and list the descriptives I want to compute instead.

What's going on?


回答1:


The reason it's giving you problems is because n() doesn't take any arguments, unlike mean() and median(). Use length() instead to get the desired effect:

summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="length"))



回答2:


Here, we can use the ~ if we want to have finer control, i.e. adding other parameters

library(dplyr)
mtcars %>% 
      summarise_all(list(mean = ~ mean(.), median = ~median(.), n = ~ n()))

However, getting the n() for each column is not making much sense as it would be the same. Instead create the n() before doing the summarise

mtcars %>%
   group_by(n = n()) %>%
   summarise_all(list(mean = mean, median = median))

Otherwise, just pass the unquoted function

mtcars %>%
     summarise_all(list(mean = mean, median = median))


来源:https://stackoverflow.com/questions/58068522/summarize-all-with-n-function

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!