Counting existing permutations in R

后端 未结 2 1909
無奈伤痛
無奈伤痛 2020-12-20 07:33

I have a large dataset with columns IDNum, Var1, Var2, Var3, Var4, Var5, Var6. The variables are boolean with value either 0 or 1. Each row could be one of 64 different poss

相关标签:
2条回答
  • 2020-12-20 08:02

    This will give a slightly different result and will list out all the possibilities regardless of whether they are present or not. Example data:

    nam <- c("IDNum",paste0("Var",1:6))
    n <- 5
    set.seed(23)
    dat <- setNames(data.frame(1:n,replicate(6,sample(0:1,n,replace=TRUE))),nam)
    
    
    #  IDNum Var1 Var2 Var3 Var4 Var5 Var6
    #1     1    1    0    1    0    1    1
    #2     2    0    1    1    1    0    1
    #3     3    0    1    0    1    0    1
    #4     4    1    1    0    1    1    0
    #5     5    1    1    1    1    0    1
    

    Count em up:

    data.frame(table(dat[-1]))
    
    #   Var1 Var2 Var3 Var4 Var5 Var6 Freq
    #1     0    0    0    0    0    0    0
    #...
    #28    1    1    0    1    1    0    1
    #...
    #43    0    1    0    1    0    1    1
    #...
    #47    0    1    1    1    0    1    1
    #48    1    1    1    1    0    1    1
    #...
    #54    1    0    1    0    1    1    1
    #...
    #64    1    1    1    1    1    1    0
    
    0 讨论(0)
  • 2020-12-20 08:03

    aggregate can do this. Here's a shorter example:

    r <- function() rbinom(10, 1, .5)
    d <- data.frame(IDNum=1:10, Var1=r(), Var2=r())
    d
       IDNum Var1 Var2
    1      1    0    1
    2      2    0    1
    3      3    0    0
    4      4    1    0
    5      5    1    1
    6      6    0    0
    7      7    1    1
    8      8    1    0
    9      9    0    1
    10    10    0    1
    

    Now to count the number of each combination:

    > aggregate(d$IDNum, d[-1], FUN=length)
      Var1 Var2 x
    1    0    0 2
    2    1    0 2
    3    0    1 4
    4    1    1 2
    

    The values in d$IDNum aren't actually used here, but something must be passed to the length function. The values in d$IDNum for each combination are passed to length to get the count.

    0 讨论(0)
提交回复
热议问题