Create Combinations in R by Groups

后端 未结 7 1061
一整个雨季
一整个雨季 2021-02-08 19:23

I want to create a list for my classroom of every possible group of 4 students. If I have 20 students, how I can I create this, by group, in R where my rows are each combination

7条回答
  •  灰色年华
    2021-02-08 19:39

    Currently, this is implemented in the development version of RcppAlgos and will be in the next official release on CRAN. This is now officially apart of the production version of RcppAlgos*.

    library(RcppAlgos)
    a <- comboGroups(10, numGroups = 2, retType = "3Darray")
    
    dim(a)
    [1] 126   5   2
    
    a[1,,]
         Grp1 Grp2
    [1,]    1    6
    [2,]    2    7
    [3,]    3    8
    [4,]    4    9
    [5,]    5   10
    
    a[126,,]
         Grp1 Grp2
    [1,]    1    2
    [2,]    7    3
    [3,]    8    4
    [4,]    9    5
    [5,]   10    6
    

    Or if you prefer matrices:

    a1 <- comboGroups(10, 2, retType = "matrix")
    
    head(a1)
         Grp1 Grp1 Grp1 Grp1 Grp1 Grp2 Grp2 Grp2 Grp2 Grp2
    [1,]    1    2    3    4    5    6    7    8    9   10
    [2,]    1    2    3    4    6    5    7    8    9   10
    [3,]    1    2    3    4    7    5    6    8    9   10
    [4,]    1    2    3    4    8    5    6    7    9   10
    [5,]    1    2    3    4    9    5    6    7    8   10
    [6,]    1    2    3    4   10    5    6    7    8    9
    

    It is also really fast. You can even generate in parallel with nThreads or Parallel = TRUE (the latter uses one minus the system max threads) for greater efficiency gains:

    comboGroupsCount(16, 4)
    [1] 2627625
    
    system.time(comboGroups(16, 4, "matrix"))
     user  system elapsed 
    0.107   0.030   0.137
    
    system.time(comboGroups(16, 4, "matrix", nThreads = 4))
     user  system elapsed 
    0.124   0.067   0.055
                                    ## 7 threads on my machine
    system.time(comboGroups(16, 4, "matrix", Parallel = TRUE))
     user  system elapsed 
    0.142   0.126   0.047
    

    A really nice feature is the ability to generate samples or specific lexicographical combination groups, especially when the number of results is large.

    comboGroupsCount(factor(state.abb), numGroups = 10)
    Big Integer ('bigz') :
    [1] 13536281554808237495608549953475109376
    
    mySamp <- comboGroupsSample(factor(state.abb), 
                                numGroups = 10, "3Darray", n = 5, seed = 42)
    
    mySamp[1,,]
         Grp1 Grp2 Grp3 Grp4 Grp5 Grp`6 Grp7 Grp8 Grp9 Grp10
    [1,] AL   AK   AR   CA   CO   CT   DE   FL   LA   MD   
    [2,] IA   AZ   ME   ID   GA   OR   IL   IN   MS   NM   
    [3,] KY   ND   MO   MI   HI   PA   MN   KS   MT   OH   
    [4,] TX   RI   SC   NH   NV   WI   NE   MA   NY   TN  
    [5,] VA   VT   UT   OK   NJ   WY   WA   NC   SD   WV   
    50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS MT NC ND NE NH NJ NM NV NY OH ... WY
    
    firstAndLast <- comboGroupsSample(state.abb, 10, "3Darray",
                                      sampleVec = c("1",
                                                    "13536281554808237495608549953475109376"))
    
    firstAndLast[1,,]
         Grp1 Grp2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10
    [1,] "AL" "CO" "HI" "KS" "MA" "MT" "NM" "OK" "SD" "VA" 
    [2,] "AK" "CT" "ID" "KY" "MI" "NE" "NY" "OR" "TN" "WA" 
    [3,] "AZ" "DE" "IL" "LA" "MN" "NV" "NC" "PA" "TX" "WV" 
    [4,] "AR" "FL" "IN" "ME" "MS" "NH" "ND" "RI" "UT" "WI" 
    [5,] "CA" "GA" "IA" "MD" "MO" "NJ" "OH" "SC" "VT" "WY"
    
    firstAndLast[2,,]
         Grp1 Grp2 Grp3 Grp4 Grp5 Grp6 Grp7 Grp8 Grp9 Grp10
    [1,] "AL" "AK" "AZ" "AR" "CA" "CO" "CT" "DE" "FL" "GA" 
    [2,] "WA" "TX" "RI" "OH" "NM" "NE" "MN" "ME" "IA" "HI" 
    [3,] "WV" "UT" "SC" "OK" "NY" "NV" "MS" "MD" "KS" "ID" 
    [4,] "WI" "VT" "SD" "OR" "NC" "NH" "MO" "MA" "KY" "IL" 
    [5,] "WY" "VA" "TN" "PA" "ND" "NJ" "MT" "MI" "LA" "IN"
    

    And finally, generating all 2,546,168,625 combinations groups of 20 people into 5 groups (what the OP asked for) can be achieved in under a minute using the lower and upper arguments:

    system.time(aPar <- parallel::mclapply(seq(1, 2546168625, 969969), function(x) {
         combs <- comboGroups(20, 5, "3Darray", lower = x, upper = x + 969968)
         ### do something
         dim(combs)
    }, mc.cores = 6))
       user  system elapsed 
    217.667  22.932  48.482
    
    sum(sapply(aPar, "[", 1))
    [1] 2546168625
    

    Although I started working on this problem over a year ago, this question was a huge inspiration for getting this formalized in a package.

    * I am the author of RcppAlgos

提交回复
热议问题