问题
I have a data frame which looks like this:
structure(list(ab = c(0, 1, 1, 1, 1, 0, 0, 0, 1, 1), bc = c(1,
1, 1, 1, 0, 0, 0, 1, 0, 1), de = c(0, 0, 1, 1, 1, 0, 1, 1, 0,
1), cl = c(1, 2, 3, 1, 2, 3, 1, 2, 3, 2)), .Names = c("ab", "bc",
"de", "cl"), row.names = c(NA, -10L), class = "data.frame")
The column cl indicates a cluster association and the variables ab,bc & de carry binary answers, where 1 indicates yes and 0 - No.
I am trying to create a table cross tabbing cluster along with all the other columns in the data frame viz ab, bc and de, where the clusters become column variables. The desired output is like this
1 2 3
ab 1 3 2
bc 2 3 1
de 2 3 1
I tried the following code:
with(newdf, tapply(newdf[,c(3)], cl, sum))
This provides me values cross tabbing only one column at a time. My data frame has 1600+ columns with 1 cluster column. Can someone help?
回答1:
Your data is in a half-long half-wide format, and you want it in a fully wide format. This is easiest if we first covert it to a fully long format:
library(reshape2)
df_long = melt(df, id.vars = "cl")
head(df_long)
# cl variable value
# 1 1 ab 0
# 2 2 ab 1
# 3 3 ab 1
# 4 1 ab 1
# 5 2 ab 1
# 6 3 ab 0
Then we can turn it into a wide format, using sum
as the aggregating function:
dcast(df_long, variable ~ cl, fun.aggregate = sum)
# variable 1 2 3
# 1 ab 1 3 2
# 2 bc 2 3 1
# 3 de 2 3 1
回答2:
One way using dplyr
would be:
library(dplyr)
df %>%
#group by the varialbe cl
group_by(cl) %>%
#sum every column
summarize_each(funs(sum)) %>%
#select the three needed columns
select(ab, bc, de) %>%
#transpose the df
t
Output:
[,1] [,2] [,3]
ab 1 3 2
bc 2 3 1
de 2 3 1
回答3:
In base
R:
t(sapply(data[,1:3],function(x) tapply(x,data[,4],sum)))
# 1 2 3
#ab 1 3 2
#bc 2 3 1
#de 2 3 1
回答4:
You can also combine tidyr:gather
or reshape2::melt
and xtabs
to have your contengency table
library(tidyr)
xtabs(value ~ key + cl, data = gather(df, key, value, -cl))
## cl
## key 1 2 3
## ab 1 3 2
## bc 2 3 1
## de 2 3 1
If your prefer to use pipe
df %>%
gather(key, value, -cl) %>%
xtabs(value ~ key + cl, data = .)
来源:https://stackoverflow.com/questions/33455504/creating-a-contingency-table-using-multiple-columns-in-a-data-frame-in-r