问题
To get the question clear, let me start with one baby example of my data frame.
ID <- c(rep("first", 2), rep("second", 4), rep("third",1), rep("fourth", 3))
Var_1 <- c(rep("A",2), rep("B", 2), rep("A",3), rep("B", 2), "A")
Var_2 <- c(rep("C",2), rep("D",3) , rep("C",2), rep("E",2), "D")
DF <- data.frame(ID, Var_1, Var_2)
> DF
ID Var_1 Var_2
1 first A C
2 first A C
3 second B D
4 second B D
5 second A D
6 second A C
7 third A C
8 fourth B E
9 fourth B E
10 fourth A D
There is one ID
factor variable and two factor variables Var_1
with R=2
factor levels and Var_2
with C=3
factor levels.
I would like to get a new data frame with (RxC)+1=(2x3)+1
Variables with the frequencies of all combinations of factor levels - separately for each level in ID Variable, that would look like this:
ID A.C A.D A.E B.C B.D B.E
1 first 2 0 0 0 0 0
2 second 1 1 0 0 2 0
3 third 1 0 0 0 0 0
4 fourth 0 1 0 0 0 2
I tried a couple of functions, but results were not even close to this, so they are not even worth of mentioning. In original data frame I should get (6x9)+1=55 Variables.
EDIT: There are solutions for counting factor levels for one or many variables separatly, but I couldn´t figure it out how to make a common counts for combinations of factor levels for two (or more) variables. Implementig the solution to others seems easy now when I got the answers, but I could not get there by myself.
回答1:
Using the dcast
function from the reshape package (or data.table which has an enhanced implementation of the dcast
function):
library(reshape2)
dcast(DF, ID ~ paste(Var_1,Var_2,sep="."), fun.aggregate = length)
which gives:
ID A.C A.D B.D B.E
1 first 2 0 0 0
2 fourth 0 1 0 2
3 second 1 1 2 0
4 third 1 0 0 0
回答2:
We could use paste
to create a variable combining Var_1 and Var_2 and then produce a contingency table with ID and the new variable:
table(DF$ID,paste(DF$Var_1,DF$Var_2,sep="."))
output
A.C A.D B.D B.E
first 2 0 0 0
fourth 0 1 0 2
second 1 1 2 0
third 1 0 0 0
To order the table rows, we would need to factor(DF$ID,levels=c("first","second","third","fourth"))
beforehand.
回答3:
Try
library(tidyr)
library(dplyr)
DF %>%
unite(Var, Var_1, Var_2, sep = ".") %>%
count(ID, Var) %>%
spread(Var, n, fill = 0)
Which gives:
#Source: local data frame [4 x 5]
#
# ID A.C A.D B.D B.E
# (fctr) (dbl) (dbl) (dbl) (dbl)
#1 first 2 0 0 0
#2 fourth 0 1 0 2
#3 second 1 1 2 0
#4 third 1 0 0 0
来源:https://stackoverflow.com/questions/35131355/r-how-to-get-common-counts-frequency-of-levels-of-two-factor-variables-by-id