Create Combinations in R by Groups

后端 未结 7 1060
一整个雨季
一整个雨季 2021-02-08 19:23

I want to create a list for my classroom of every possible group of 4 students. If I have 20 students, how I can I create this, by group, in R where my rows are each combination

7条回答
  •  日久生厌
    2021-02-08 19:56

    This is a challenging problem computationally, since I believe there are 2.5 billion possibilities to enumerate. (If it's mistaken, I'd welcome any insight about where this approach goes wrong.)

    Depending on how it's stored, a table with all those groupings might require more RAM than most computers can handle. I'd be impressed to see an efficient way to create that. If we took a "create one combination at a time" approach, it would still take 41 minutes to generate all the possibilities if we could generate 1,000,000 per second, or a month if we could only generate 1,000 per second.

    EDIT - added partial implementation at the bottom to create any desired grouping from #1 to #2,546,168,625. For some purposes, this may be almost as good as actually storing the whole sequence, which is very large.


    Let's say we are going to make 5 groups of four students each: Group A, B, C, D, and E.

    Let's define Group A as the group Student #1 is in. They can be paired with any three of the other 19 students. I believe there are 969 such combinations of other students:

    > nrow(t(combn(1:19, 3)))
    [1] 969
    

    Now there are now 16 students left for other groups. Let's assign the first student not already in Group A into Group B. That might be student 2, 3, 4, or 5. It doesn't matter; all we need to know is that there are only 15 students that can be paired with that student. There are 455 such combinations:

    > nrow(t(combn(1:15, 3)))
    [1] 455
    

    Now there are 12 student left. Again, let's assign the first ungrouped student to Group C, and we have 165 combinations left for them with the other 11 students:

    > nrow(t(combn(1:11, 3)))
    [1] 165
    

    And we have 8 students left, 7 of whom can be paired with first ungrouped student into Group D in 35 ways:

    > nrow(t(combn(1:7, 3)))
    [1] 35
    

    And then, once our other groups are determined, there's only one group of four students left, three of whom can be paired with the first ungrouped student:

    > nrow(t(combn(1:3, 3)))
    [1] 1
    

    That implies 2.546B combinations:

    > 969*455*165*35*1
    [1] 2546168625
    

    Here's a work-in-progress function that produces a grouping based on any arbitrary sequence number.

    1) [in progress] Convert sequence number to a vector describing which # combination should be used for Group A, B, C, D, and E. For instance, this should convert #1 to c(1, 1, 1, 1, 1) and #2,546,168,625 to c(969, 455, 165, 35, 1).

    2) Convert the combinations to a specific output describing the students in each Group.

    groupings <- function(seq_nums) {
      students <- 20
      group_size = 4
      grouped <- NULL
      remaining <- 1:20
      seq_nums_pad <- c(seq_nums, 1) # Last group always uses the only possible combination
      for (g in 1:5) {
        group_relative <- 
          c(1, 1 + t(combn(1:(length(remaining) - 1), group_size - 1))[seq_nums_pad[g], ])
        group <- remaining[group_relative]
        print(group)
        grouped = c(grouped, group)
        remaining <-  setdiff(remaining, grouped)
      }
    }
    
    > groupings(c(1,1,1,1))
    #[1] 1 2 3 4
    #[1] 5 6 7 8
    #[1]  9 10 11 12
    #[1] 13 14 15 16
    #[1] 17 18 19 20
    > groupings(c(1,1,1,2))
    #[1] 1 2 3 4
    #[1] 5 6 7 8
    #[1]  9 10 11 12
    #[1] 13 14 15 17
    #[1] 16 18 19 20
    > groupings(c(969, 455, 165, 35))   # This one uses the last possibility for
    #[1]  1 18 19 20                    #   each grouping.
    #[1]  2 15 16 17
    #[1]  3 12 13 14
    #[1]  4  9 10 11
    #[1] 5 6 7 8
    

提交回复
热议问题