Algorithm for “consolidating” N items into K

后端 未结 2 1918
無奈伤痛
無奈伤痛 2021-01-29 08:21

I was wondering whether there\'s a known algorithm for doing the following, and also wondering how it would be implemented in C#. Maybe this is a known type of problem.

相关标签:
2条回答
  • 2021-01-29 08:57

    If I'm not mistaken, the problem you're describing is Number of k-combinations for all k

    I found a code snippet which I believe addresses your use case but I just can't remember where I got it from. It must have been from StackOverflow. If anyone recognized this particular piece of code, please let me know and I'll make sure to credit it.

    So here's the extension method:

    public static class ListExtensions
    {
        public static List<ILookup<int, TItem>> GroupCombinations<TItem>(this List<TItem> items, int count)
        {
            var keys = Enumerable.Range(1, count).ToList();
            var indices = new int[items.Count];
            var maxIndex = items.Count - 1;
            var nextIndex = maxIndex;
            indices[maxIndex] = -1;
            var groups = new List<ILookup<int, TItem>>();
    
            while (nextIndex >= 0)
            {
                indices[nextIndex]++;
    
                if (indices[nextIndex] == keys.Count)
                {
                    indices[nextIndex] = 0;
                    nextIndex--;
                    continue;
                }
    
                nextIndex = maxIndex;
    
                if (indices.Distinct().Count() != keys.Count)
                {
                    continue;
                }
    
                var group = indices.Select((keyIndex, valueIndex) =>
                                            new
                                            {
                                                Key = keys[keyIndex],
                                                Value = items[valueIndex]
                                            })
                    .ToLookup(x => x.Key, x => x.Value);
    
                groups.Add(group);
            }
            return groups;
        }
    }
    

    And a little utility method that prints the output:

    public void PrintGoldmineCombinations(int count, List<GoldMine> mines)
    {
        Debug.WriteLine("count = " + count);
        var groupNumber = 0;
        foreach (var group in mines.GroupCombinations(count))
        {
            groupNumber++;
            Debug.WriteLine("group " + groupNumber);
            foreach (var set in group)
            {
                Debug.WriteLine(set.Key + ": " + set.Sum(m => m.TonsOfGold) + " tons of gold");
            }
        }
    }
    

    You would use it like so:

    var mines = new List<GoldMine>
    {
        new GoldMine {TonsOfGold = 10},
        new GoldMine {TonsOfGold = 12},
        new GoldMine {TonsOfGold = 5}
    };
    
    PrintGoldmineCombinations(1, mines);
    PrintGoldmineCombinations(2, mines);
    PrintGoldmineCombinations(3, mines);
    

    Which will produce the following output:

    count = 1
    group 1
    1: 27 tons of gold
    count = 2
    group 1
    1: 22 tons of gold
    2: 5 tons of gold
    group 2
    1: 15 tons of gold
    2: 12 tons of gold
    group 3
    1: 10 tons of gold
    2: 17 tons of gold
    group 4
    2: 10 tons of gold
    1: 17 tons of gold
    group 5
    2: 15 tons of gold
    1: 12 tons of gold
    group 6
    2: 22 tons of gold
    1: 5 tons of gold
    count = 3
    group 1
    1: 10 tons of gold
    2: 12 tons of gold
    3: 5 tons of gold
    group 2
    1: 10 tons of gold
    3: 12 tons of gold
    2: 5 tons of gold
    group 3
    2: 10 tons of gold
    1: 12 tons of gold
    3: 5 tons of gold
    group 4
    2: 10 tons of gold
    3: 12 tons of gold
    1: 5 tons of gold
    group 5
    3: 10 tons of gold
    1: 12 tons of gold
    2: 5 tons of gold
    group 6
    3: 10 tons of gold
    2: 12 tons of gold
    1: 5 tons of gold
    

    Note: this does not take into account duplicates by the contents of the sets and I'm not sure if you actually want those filtered out or not. Is this what you need?

    EDIT

    Actually, looking at your comment it seems you don't want the duplicates and you also want the lower values of k included, so here is a minor modification that takes out the duplicates (in a really ugly way, I apologize) and gives you the lower values of k per group:

    public static List<ILookup<int, TItem>> GroupCombinations<TItem>(this List<TItem> items, int count)
    {
        var keys = Enumerable.Range(1, count).ToList();
        var indices = new int[items.Count];
        var maxIndex = items.Count - 1;
        var nextIndex = maxIndex;
        indices[maxIndex] = -1;
        var groups = new List<ILookup<int, TItem>>();
    
        while (nextIndex >= 0)
        {
            indices[nextIndex]++;
    
            if (indices[nextIndex] == keys.Count)
            {
                indices[nextIndex] = 0;
                nextIndex--;
                continue;
            }
    
            nextIndex = maxIndex;
    
            var group = indices.Select((keyIndex, valueIndex) =>
                                        new
                                        {
                                            Key = keys[keyIndex],
                                            Value = items[valueIndex]
                                        })
                .ToLookup(x => x.Key, x => x.Value);
    
            if (!groups.Any(existingGroup => group.All(grouping1 => existingGroup.Any(grouping2 => grouping2.Count() == grouping1.Count() && grouping2.All(item => grouping1.Contains(item))))))
            {
                groups.Add(group);
            }
        }
        return groups;
    }
    

    It produces the following output for k = 2:

    group 1
    1: 27 tons of gold
    group 2
    1: 22 tons of gold
    2: 5 tons of gold
    group 3
    1: 15 tons of gold
    2: 12 tons of gold
    group 4
    1: 10 tons of gold
    2: 17 tons of gold
    
    0 讨论(0)
  • 2021-01-29 09:00

    This is actually the problem of enumerating all K-partitions of a set of N objects, often described as enumerating the ways to place N labelled objects into K unlabelled boxes.

    As is almost always the case, the easiest way to solve a problem involving enumeration of unlabelled or unordered alternatives is to create a canonical ordering and then figure out how to generate only canonically-ordered solutions. In this case, we assume that the objects have some total ordering so that we can refer to them by integers between 1 and N, and then we place the objects in order into the partitions, and order the partitions by the index of the first object in each one. It's pretty easy to see that this ordering cannot produce duplicates and that every partitioning must correspond to some canonical ordering.

    We can then represent a given canonical ordering by a sequence of N integers, where each integer is the number of the partition for the corresponding object. Not every sequence of N integers will work, however; we need to constrain the sequences so that the partitions are in the canonical order (sorted by the index of the first element). The constraint is simple: each element in the sequence must either be some integer which previously appeared in the sequence (an object placed into an already present partition) or it must be the index of the next partition, which is one more than the index of the last partition already present. In summary:

    • The first entry in the sequence must be 1 (because the first object can only be placed into the first partition); and
    • Each subsequent entry is at least 1 and no greater than one more than the largest entry preceding that point.
      (These two criteria could be combined if we interpret "the largest entry preceding" the first entry as 0.)
    • That's not quite enough, since it doesn't restrict the sequence to exactly K. If we wanted to find all of the partitions, that would be fine, but if we want all the partitions whose size is precisely K then we need to constrain the last element in the sequence to be K, which means that the second last element must be at least K−1, the third last element at least K−2, and so on, as well as not allowing any element to be greater than K:
      The element at position i must be in the range [max(1, K+iN), K]

    Generating sequences according to a simple set of constraints like the above can easily be done recursively. We start with an empty sequence, and then successively add each possible next elements, calling this procedure recursively to fill in the entire sequence. As long as it is simple to produce the list of possible next elements, the recursive procedure will be straight-forward. In this case, we need three pieces of information to produce this list: N, K, and the maximum value generated so far.

    That leads to the following pseudo-code:

    GenerateAllSequencesHelper(N, K, M, Prefix):
      if length(Prefix) is N:
         Prefix is a valid sequence; handle it
      else:
        # [See Note 1]
        for i from  max(1, length(Prefix) + 1 + K - N)
              up to min(M + 1, K):
          Append i to Prefix            
          GenerateAllSequencesHelper(N, K, max(M, i), Prefix)
          Pop i off of Prefix
    
    GenerateAllSequences(N, K):
      GenerateAllSequencesHelper(N, K, 0, [])
    

    Since the recursion depth will be extremely limited for any practical application of this procedure, the recursive solution should be fine. However, it is also quite simple to produce an iterative solution even without using a stack. This is an instance of a standard enumeration algorithm for constrained sequences:

    • Start with the lexicographically smallest possible sequence
    • While possible:
      • Scan backwards to find the last element which could be increased. ("Could be" means that increasing that element would still result in the prefix of some valid sequence.)
      • Increment that element to the next largest possible value
      • Fill in the rest of the sequence with the smallest possible suffix.

    In the iterative algorithm, the backwards scan might involve checking O(N) elements, which apparently makes it slower than the recursive algorithm. However, in most cases they will have the same computational complexity, because in the recursive algorithm each generated sequence also incurs the cost of the recursive calls and returns required to reach it. If each (or, at least, most) recursive calls produce more than one alternative, the recursive algorithm will still be O(1) per generated sequence.

    But in this case, it is likely that the iterative algorithm will also be O(1) per generated sequence, as long as the scan step can be performed in O(1); that is, as long as it can be performed without examining the entire sequence.

    In this particular case, computing the maximum value of the sequence up to a given point is not O(1), but we can produce an O(1) iterative algorithm by also maintaining the vector of cumulative maxima. (In effect, this vector corresponds to the stack of M arguments in the recursive procedure above.)

    It's easy enough to maintain the M vector; once we have it, we can easily identify "incrementable" elements in the sequence: element i is incrementable if i>0, M[i] is equal to M[i−1], and M[i] is not equal to K. [Note 2]

    Notes

    1. If we wanted to produce all partitions, we would replace the for loop above with the rather simpler:

       for i from 1 to M+1:
      
    2. This answer is largely based on this answer, but that question asked for all partitions; here, you want to generate the K-partitions. As indicated, the algorithms are very similar.

    0 讨论(0)
提交回复
热议问题