问题
I am having some difficulty counting non-missing values by group through the function below (which also gives sd, and mean):
test <- do.call(data.frame, aggregate(. ~ treatment, have, function(x) c(n = sum(!is.na(x)), mean = mean(x), sd = sd(x))))
It ends up giving me the number of non-missing for all columns in the dataframe instead of just a single column.
I have been looking through SO for some advice and found this, this, and this helpful, but I can't figure out why the aggregate with the function(x) would combine some columns for the sum(!is.na(x), but not for the mean or sd.
EDIT: Adding tables
This is the data I have
This is the data I get from my code
This is the table I want
You will notice in the 'have' dataframe that counting the non-mising rows in column var1 by treatment group gives the following:
veh - 9 gr.4 - 8 gr.3 - 10 gr.2 - 5
But when using the sum(!is.na(x) I get the following
veh - 6 gr.4 - 5 gr.3 - 10 gr.2 - 5
I believe this is because the function is using both var1 and var2 to sum the number of non-missing. I do not know how to correct for this.
Best,
Jack
回答1:
Here's a data.table
approach:
DATA
The data you have is cumbersome to read into R - please use dput()
etc. to make it easier for others:
> dput(dt)
structure(list(someting = c("503", "553", "599", "647", "695",
"728", "760", "793", "826", "859", "907", "955", "1003", "1036",
"1084", "1131", "1179", "1226", "1274", "1322", "1355", "1402",
"1450", "1497", "1545"), treatment = c("gr.2", "gr.2", "gr.2",
"gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2", "gr.2",
"gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3", "gr.3",
"gr.3", "gr.3", "gr.3", "gr.3", "gr.4", "gr.4"), var1 = c(8,
NA, 3, 3, NA, NA, NA, NA, NA, 8, 8, 8, NA, 8, 8, 8, 8, 8, 8,
NA, 8, 8, 8, 8, NA), var2 = c(8L, 8L, 8L, 8L, NA, NA, NA, NA,
NA, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L, 8L, 8L, NA, 8L, 8L, 8L, 8L,
NA)), .Names = c("someting", "treatment", "var1", "var2"), row.names = c(NA,
-25L), class = c("data.table", "data.frame"))
CODE
dt[, .(var1.n = sum(!is.na(var1)),
var2.n = sum(!is.na(var1)),
var1.mean = mean(var1, na.rm = T),
var2.mean = mean(var2, na.rm = T)),
by = .(treatment)]
OUTPUT
treatment var1.n var2.n var1.mean var2.mean
1: gr.2 5 5 6 8
2: gr.3 10 10 8 8
3: gr.4 1 1 8 8
For some reason the "veh" entries weren't read in. Hence the output is slightly different but the principle ought to be clear.
来源:https://stackoverflow.com/questions/48324106/aggregate-by-group-and-get-count-mean-and-sd-of-non-na-values-for-different-dat