问题
I'm summarizing a data frame in dplyr with the summarize_all()
function. If I do the following:
summarize_all(mydf, list(mean="mean", median="median", sd="sd"))
I get a tibble with 3 variables for each of my original measures, all suffixed by the type (mean, median, sd). Great! But when I try to capture the within-vector n's to calculate the standard deviations myself and to make sure missing cells aren't counted...
summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="n"))
...I get an error:
Error in (function () : unused argument (var_a)
This is not an issue with my var_a
vector. If I remove it, I get the same error for var_b
, etc. The summarize_all
function is producing odd results whenever I request n
or n()
, or if I use .funs()
and list the descriptives I want to compute instead.
What's going on?
回答1:
The reason it's giving you problems is because n()
doesn't take any arguments, unlike mean()
and median()
. Use length()
instead to get the desired effect:
summarize_all(mydf, list(mean="mean", median="median", sd="sd", n="length"))
回答2:
Here, we can use the ~
if we want to have finer control, i.e. adding other parameters
library(dplyr)
mtcars %>%
summarise_all(list(mean = ~ mean(.), median = ~median(.), n = ~ n()))
However, getting the n()
for each column is not making much sense as it would be the same. Instead create the n()
before doing the summarise
mtcars %>%
group_by(n = n()) %>%
summarise_all(list(mean = mean, median = median))
Otherwise, just pass the unquoted function
mtcars %>%
summarise_all(list(mean = mean, median = median))
来源:https://stackoverflow.com/questions/58068522/summarize-all-with-n-function