问题
I have a dataframe df
with three categorical variables cat1
,cat2
,cat3
and two continuous variables con1
,con2
. I would like to compute list of functions sd
,mean
on list of columns con1
,con2
based on different combinations of list of columns cat1
,cat2
,cat3
. I have done them explicitly subsetting all different combinations.
# Random generation of values for categorical data
set.seed(33)
df <- data.frame(cat1 = sample( LETTERS[1:2], 100, replace=TRUE ),
cat2 = sample( LETTERS[3:5], 100, replace=TRUE ),
cat3 = sample( LETTERS[2:4], 100, replace=TRUE ),
con1 = runif(100,0,100),
con2 = runif(100,23,45))
# Introducing null values
df$con1[c(23,53,92)] <- NA
df$con2[c(33,46)] <- NA
results <- data.frame()
funs <- list(sd=sd, mean=mean)
# calculation of mean and sd on total observations
sapply(funs, function(x) sapply(df[,c(4,5)], x, na.rm=T))
# calculation of mean and sd on different levels of cat1
sapply(funs, function(x) sapply(df[df$cat1=='A',c(4,5)], x, na.rm=T))
sapply(funs, function(x) sapply(df[df$cat1=='B',c(4,5)], x, na.rm=T))
# calculation of mean and sd on different levels of cat1 and cat2
sapply(funs, function(x) sapply(df[df$cat1=='A' & df$cat2=='C' ,c(4,5)], x, na.rm=T))
.
.
.
sapply(funs, function(x) sapply(df[df$cat1=='B' & df$cat2=='E' ,c(4,5)], x, na.rm=T))
# Similarly for the combinations of three cat variables cat1, cat2, cat3
I would like to write a function on dynamically computing the list of functions for list of columns based on different combinations. Could you please give some suggestions. Thanks !
Edit:
I have already got some smart suggestions using dplyr
. It would be great if someone provides suggestions using the apply
family functions as it will help in using them(dataframes) in the further requirements.
回答1:
This is a simple one-line base solution:
> do.call(cbind, lapply(funs, function(x) aggregate(cbind(con1, con2) ~ cat1 + cat2 + cat3, data = df, FUN = x, na.rm = TRUE)))
sd.cat1 sd.cat2 sd.cat3 sd.con1 sd.con2 mean.cat1 mean.cat2 mean.cat3 mean.con1 mean.con2
1 A C B NA NA A C B 25.52641 37.40603
2 B C B 32.67192 6.966547 B C B 46.70387 34.85437
3 A D B 31.05224 6.530313 A D B 37.91553 37.13142
4 B D B 23.80335 6.001468 B D B 59.75107 30.29681
5 A E B 22.79285 1.526472 A E B 38.54742 25.23007
6 B E B 32.92139 2.621067 B E B 51.56253 29.52367
7 A C C 26.98661 5.710335 A C C 36.32045 36.42465
8 B C C 20.22217 8.117184 B C C 60.60036 34.98460
9 A D C 33.39273 7.367412 A D C 40.77786 35.03747
10 B D C 12.95351 8.829061 B D C 49.77160 33.21836
11 A E C 33.73433 4.689548 A E C 55.53135 32.38279
12 B E C 25.38637 9.172137 B E C 46.69063 31.56733
13 A C D 36.12545 6.323929 A C D 48.34187 32.36789
14 B C D 30.01992 7.130869 B C D 53.87571 33.12760
15 A D D 15.94151 11.756115 A D D 35.89909 31.76871
16 B D D 10.89030 6.829829 B D D 22.86577 32.53725
17 A E D 24.88410 6.108631 A E D 47.32549 35.22782
18 B E D 12.73711 8.151424 B E D 33.95569 36.70167
来源:https://stackoverflow.com/questions/31133492/apply-list-of-functions-on-list-of-columns-based-on-different-combinations