Algorithm to generate all multiset size-n partitions

后端 未结 3 1131
抹茶落季
抹茶落季 2021-02-09 18:42

I\'ve been trying to figure out a way to generate all distinct size-n partitions of a multiset, but so far have come up empty handed. First let me show what I\'m trying to archi

相关标签:
3条回答
  • 2021-02-09 19:03

    Here's a working solution that makes use of the next_combination function presented by Hervé Brönnimann in N2639. The comments should make it pretty self-explanatory. The "herve/combinatorics.hpp" file contains the code listed in N2639 inside the herve namespace. It's in C++11/14, converting to an older standard should be pretty trivial.

    Note that I only quickly tested the solution. Also, I extracted it from a class-based implementation just a couple of minutes ago, so some extra bugs might have crept in. A quick initial test seems to confirm it works, but there might be corner cases for which it won't.

    #include <cstdint>
    #include <iterator>
    
    #include "herve/combinatorics.hpp"
    
    template <typename BidirIter>
    bool next_combination_partition (BidirIter const & startIt,
      BidirIter const & endIt, uint32_t const groupSize) {
      // Typedefs
      using tDiff = typename std::iterator_traits<BidirIter>::difference_type;
    
      // Skip the last partition, because is consists of the remaining elements.
      // Thus if there's 2 groups or less, the start should be at position 0.
      tDiff const totalLength = std::distance(startIt, endIt);
      uint32_t const numTotalGroups = std::max(static_cast<uint32_t>((totalLength - 1) / groupSize + 1), 2u);
      uint32_t curBegin = (numTotalGroups - 2) * groupSize;
      uint32_t const lastGroupBegin = curBegin - 1;
      uint32_t curMid = curBegin + groupSize;
      bool atStart = (totalLength != 0);
    
      // Iterate over combinations from back of list to front. If a combination ends
      // up at its starting value, update the previous one as well.
      for (; (curMid != 0) && (atStart);
        curMid = curBegin, curBegin -= groupSize) {
        // To prevent duplicates, first element of each combination partition needs
        // to be fixed. So move start iterator to the next element. This is not true
        // for the starting (2nd to last) group though.
        uint32_t const startIndex = std::min(curBegin + 1, lastGroupBegin + 1);
        auto const iterStart = std::next(startIt, startIndex);
        auto const iterMid = std::next(startIt, curMid);
        atStart = !herve::next_combination(iterStart, iterMid, endIt);
      }
    
      return !atStart;
    }
    

    Edit Below is my quickly thrown together test code ("combopart.hpp" obviously being the file containing the above function).

    #include "combopart.hpp"
    
    #include <algorithm>
    #include <cstdint>
    #include <iostream>
    #include <iterator>
    #include <vector>
    
    int main (int argc, char* argv[]) {
      uint32_t const groupSize = 2;
    
      std::vector<uint32_t> v;
      v = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
      v = {0, 0, 0, 1, 1, 1, 2, 2, 2, 3};
      v = {1, 1, 2, 2};
    
      // Make sure contents are sorted
      std::sort(v.begin(), v.end());
    
      uint64_t count = 0;
      do {
        ++count;
    
        std::cout << "[ ";
        uint32_t elemCount = 0;
        for (auto it = v.begin(); it != v.end(); ++it) {
          std::cout << *it << " ";
          elemCount++;
          if ((elemCount % groupSize == 0) && (it != std::prev(v.end()))) {
            std::cout << "| ";
          }
        }
        std::cout << "]" << std::endl;
      } while (next_combination_partition(v.begin(), v.end(), groupSize));
    
      std::cout << std::endl << "# elements: " << v.size() << " - group size: " <<
        groupSize << " - # combination partitions: " << count << std::endl;
    
      return 0;
    }
    

    Edit 2 Improved algorithm. Replaced early exit branch with combination of conditional move (using std::max) and setting atStart boolean to false. Untested though, be warned.

    Edit 3 Needed an extra modification so as not to "fix" the first element in the 2nd to last partition. The additional code should compile as a conditional move, so there should be no branching cost associated with it.

    P.S.: I am aware that the code to generate combinations by @Howard Hinnant (available at https://howardhinnant.github.io/combinations.html) is much faster than the one by Hervé Brönnimann. However, that code can not handle duplicates in the input (because as far as I can see, it never even dereferences an iterator), which my problem explicitly requires. On the other hand, if you know for sure your input won't contain duplicates, it is definitely the code you want use with my function above.

    0 讨论(0)
  • 2021-02-09 19:16

    A recursive algorithm to distribute the elements one-by-one could be based on a few simple rules:

    • Start by sorting or counting the different elements; they don't have to be in any particular order, you just want to group identical elements together. (This step will simplify some of the following steps, but could be skipped.)
       {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}  
    
    • Start with an empty solution, and insert the elements one by one, using the following rules:
       { , , } { , , } { , , }  
    
    • Before inserting an element, find the duplicate blocks, e.g.:
       {A, , } { , , } { , , }  
                        ^dup^
    
       {A, , } {A, , } {A, , }  
                ^dup^   ^dup^
    
    • Insert the element into every non-duplicate block with available space:
       partial solution: {A, , } {A, , } { , , }  
                                  ^dup^
    
       insert element B: {A,B, } {A, , } { , , }  
                         {A, , } {A, , } {B, , }  
    
    • If an identical element is already present, don't put the new element before it:
       partial solution:  {A, , } {B, , } { , , }  
       insert another B:  {A,B, } {B, , } { , , }  <- ILLEGAL  
                          {A, , } {B,B, } { , , }  <- OK
                          {A, , } {B, , } {B, , }  <- OK
    
    • When inserting an element of which there are another N identical elements, make sure to leave N open spots after the current element:
       partial solution:  {A, , } {A, , } {B,B, }  
       insert first D:    {A,D, } {A, , } {B,B, }  <- OK  
                          {A, , } {A, , } {B,B,D}  <- ILLEGAL (NO SPACE FOR 2ND D)  
    
    • The last group of identical elements can be inserted in one go:
       partial solution:  {A,A, } {B,B,D} {D, , }  
       insert C,C,C:      {A,A,C} {B,B,D} {D,C,C}  
    

    So the algorithm would be something like this:

    // PREPARATION  
    Sort or group input.              // {A,B,D,C,C,D,B,A,C} -> {A,A,B,B,D,D,C,C,C}  
    Create empty partial solution.    // { , , } { , , } { , , }  
    Start recursion with empty partial solution and index at start of input.  
    
    // RECURSION  
    Receive partial solution, index, group size and last-used block.  
    If group size is zero:  
        Find group size of identical elements in input, starting at index.  
        Set last-used block to first block.  
    Find empty places in partial solution, starting at last-used block.  
    If index is at last group in input:  
        Fill empty spaces with elements of last group.
        Store complete solution.
        Return from recursion.
    Mark duplicate blocks in partial solution.  
    For each block in partial solution, starting at last-used block:  
        If current block is not a duplicate, and has empty places,  
        and the places left in current and later blocks is not less than the group size:
            Insert element into copy of partial solution.
            Recurse with copy, index + 1, group size - 1, current block.
    

    I tested a simple JavaScript implementation of this algorithm, and it gives the correct output.

    0 讨论(0)
  • 2021-02-09 19:25

    Here's my pencil and paper algorithm:

    Describe the multiset in item quantities, e.g., {(1,2),(2,2)}
    
    f(multiset,result):
      if the multiset is empty:
        return result
      otherwise:
        call f again with each unique distribution of one element added to result and 
        removed from the multiset state
    
    
    Example:
    {(1,2),(2,2),(3,2)} n = 2
    
    11       -> 11 22    -> 11 22 33
                11 2  2  -> 11 23 23
    1  1     -> 12 12    -> 12 12 33
                12 1  2  -> 12 13 23
    
    
    Example:
    {(1,2),(2,2),(3,2)} n = 3
    
    11      -> 112 2   -> 112 233
               11  22  -> 113 223
    1   1   -> 122 1   -> 122 133
               12  12  -> 123 123
    

    Let's solve the problem commented below by m69 of dealing with potential duplicate distribution:

    {A,B,B,C,C,D,D,D,D}
    
    We've reached {A, , }{B, , }{B, , }, have 2 C's to distribute
    and we'd like to avoid `ac  bc  b` generated along with `ac  b   bc`.
    
    Because our generation in the level just above is ordered, the series of identical 
    counts will be continuous. When a series of identical counts is encountered, make 
    the assignment for the whole block of identical counts (rather than each one), 
    and partition that contribution in descending parts; for example,
    
          | identical |
    ac     b      b
    ac     bc     b     // descending parts [1,0]
    
    Example of longer block:
    
          |    identical block     |  descending parts
    ac     bcccc  b      b      b    // [4,0,0,0] 
    ac     bccc   bc     b      b    // [3,1,0,0]
    ac     bcc    bcc    b      b    // [2,2,0,0]
    ...
    
    0 讨论(0)
提交回复
热议问题