I was wondering whether there\'s a known algorithm for doing the following, and also wondering how it would be implemented in C#. Maybe this is a known type of problem.
If I'm not mistaken, the problem you're describing is Number of k-combinations for all k
I found a code snippet which I believe addresses your use case but I just can't remember where I got it from. It must have been from StackOverflow. If anyone recognized this particular piece of code, please let me know and I'll make sure to credit it.
So here's the extension method:
public static class ListExtensions
{
public static List<ILookup<int, TItem>> GroupCombinations<TItem>(this List<TItem> items, int count)
{
var keys = Enumerable.Range(1, count).ToList();
var indices = new int[items.Count];
var maxIndex = items.Count - 1;
var nextIndex = maxIndex;
indices[maxIndex] = -1;
var groups = new List<ILookup<int, TItem>>();
while (nextIndex >= 0)
{
indices[nextIndex]++;
if (indices[nextIndex] == keys.Count)
{
indices[nextIndex] = 0;
nextIndex--;
continue;
}
nextIndex = maxIndex;
if (indices.Distinct().Count() != keys.Count)
{
continue;
}
var group = indices.Select((keyIndex, valueIndex) =>
new
{
Key = keys[keyIndex],
Value = items[valueIndex]
})
.ToLookup(x => x.Key, x => x.Value);
groups.Add(group);
}
return groups;
}
}
And a little utility method that prints the output:
public void PrintGoldmineCombinations(int count, List<GoldMine> mines)
{
Debug.WriteLine("count = " + count);
var groupNumber = 0;
foreach (var group in mines.GroupCombinations(count))
{
groupNumber++;
Debug.WriteLine("group " + groupNumber);
foreach (var set in group)
{
Debug.WriteLine(set.Key + ": " + set.Sum(m => m.TonsOfGold) + " tons of gold");
}
}
}
You would use it like so:
var mines = new List<GoldMine>
{
new GoldMine {TonsOfGold = 10},
new GoldMine {TonsOfGold = 12},
new GoldMine {TonsOfGold = 5}
};
PrintGoldmineCombinations(1, mines);
PrintGoldmineCombinations(2, mines);
PrintGoldmineCombinations(3, mines);
Which will produce the following output:
count = 1
group 1
1: 27 tons of gold
count = 2
group 1
1: 22 tons of gold
2: 5 tons of gold
group 2
1: 15 tons of gold
2: 12 tons of gold
group 3
1: 10 tons of gold
2: 17 tons of gold
group 4
2: 10 tons of gold
1: 17 tons of gold
group 5
2: 15 tons of gold
1: 12 tons of gold
group 6
2: 22 tons of gold
1: 5 tons of gold
count = 3
group 1
1: 10 tons of gold
2: 12 tons of gold
3: 5 tons of gold
group 2
1: 10 tons of gold
3: 12 tons of gold
2: 5 tons of gold
group 3
2: 10 tons of gold
1: 12 tons of gold
3: 5 tons of gold
group 4
2: 10 tons of gold
3: 12 tons of gold
1: 5 tons of gold
group 5
3: 10 tons of gold
1: 12 tons of gold
2: 5 tons of gold
group 6
3: 10 tons of gold
2: 12 tons of gold
1: 5 tons of gold
Note: this does not take into account duplicates by the contents of the sets and I'm not sure if you actually want those filtered out or not. Is this what you need?
EDIT
Actually, looking at your comment it seems you don't want the duplicates and you also want the lower values of k included, so here is a minor modification that takes out the duplicates (in a really ugly way, I apologize) and gives you the lower values of k per group:
public static List<ILookup<int, TItem>> GroupCombinations<TItem>(this List<TItem> items, int count)
{
var keys = Enumerable.Range(1, count).ToList();
var indices = new int[items.Count];
var maxIndex = items.Count - 1;
var nextIndex = maxIndex;
indices[maxIndex] = -1;
var groups = new List<ILookup<int, TItem>>();
while (nextIndex >= 0)
{
indices[nextIndex]++;
if (indices[nextIndex] == keys.Count)
{
indices[nextIndex] = 0;
nextIndex--;
continue;
}
nextIndex = maxIndex;
var group = indices.Select((keyIndex, valueIndex) =>
new
{
Key = keys[keyIndex],
Value = items[valueIndex]
})
.ToLookup(x => x.Key, x => x.Value);
if (!groups.Any(existingGroup => group.All(grouping1 => existingGroup.Any(grouping2 => grouping2.Count() == grouping1.Count() && grouping2.All(item => grouping1.Contains(item))))))
{
groups.Add(group);
}
}
return groups;
}
It produces the following output for k = 2:
group 1
1: 27 tons of gold
group 2
1: 22 tons of gold
2: 5 tons of gold
group 3
1: 15 tons of gold
2: 12 tons of gold
group 4
1: 10 tons of gold
2: 17 tons of gold
This is actually the problem of enumerating all K-partitions of a set of N objects, often described as enumerating the ways to place N labelled objects into K unlabelled boxes.
As is almost always the case, the easiest way to solve a problem involving enumeration of unlabelled or unordered alternatives is to create a canonical ordering and then figure out how to generate only canonically-ordered solutions. In this case, we assume that the objects have some total ordering so that we can refer to them by integers between 1 and N, and then we place the objects in order into the partitions, and order the partitions by the index of the first object in each one. It's pretty easy to see that this ordering cannot produce duplicates and that every partitioning must correspond to some canonical ordering.
We can then represent a given canonical ordering by a sequence of N integers, where each integer is the number of the partition for the corresponding object. Not every sequence of N integers will work, however; we need to constrain the sequences so that the partitions are in the canonical order (sorted by the index of the first element). The constraint is simple: each element in the sequence must either be some integer which previously appeared in the sequence (an object placed into an already present partition) or it must be the index of the next partition, which is one more than the index of the last partition already present. In summary:
Generating sequences according to a simple set of constraints like the above can easily be done recursively. We start with an empty sequence, and then successively add each possible next elements, calling this procedure recursively to fill in the entire sequence. As long as it is simple to produce the list of possible next elements, the recursive procedure will be straight-forward. In this case, we need three pieces of information to produce this list: N, K, and the maximum value generated so far.
That leads to the following pseudo-code:
GenerateAllSequencesHelper(N, K, M, Prefix):
if length(Prefix) is N:
Prefix is a valid sequence; handle it
else:
# [See Note 1]
for i from max(1, length(Prefix) + 1 + K - N)
up to min(M + 1, K):
Append i to Prefix
GenerateAllSequencesHelper(N, K, max(M, i), Prefix)
Pop i off of Prefix
GenerateAllSequences(N, K):
GenerateAllSequencesHelper(N, K, 0, [])
Since the recursion depth will be extremely limited for any practical application of this procedure, the recursive solution should be fine. However, it is also quite simple to produce an iterative solution even without using a stack. This is an instance of a standard enumeration algorithm for constrained sequences:
In the iterative algorithm, the backwards scan might involve checking O(N) elements, which apparently makes it slower than the recursive algorithm. However, in most cases they will have the same computational complexity, because in the recursive algorithm each generated sequence also incurs the cost of the recursive calls and returns required to reach it. If each (or, at least, most) recursive calls produce more than one alternative, the recursive algorithm will still be O(1) per generated sequence.
But in this case, it is likely that the iterative algorithm will also be O(1) per generated sequence, as long as the scan step can be performed in O(1); that is, as long as it can be performed without examining the entire sequence.
In this particular case, computing the maximum value of the sequence up to a given point is not O(1), but we can produce an O(1) iterative algorithm by also maintaining the vector of cumulative maxima. (In effect, this vector corresponds to the stack of M arguments in the recursive procedure above.)
It's easy enough to maintain the M vector; once we have it, we can easily identify "incrementable" elements in the sequence: element i is incrementable if i>0, M[i] is equal to M[i−1], and M[i] is not equal to K. [Note 2]
If we wanted to produce all partitions, we would replace the for
loop above with the rather simpler:
for i from 1 to M+1:
This answer is largely based on this answer, but that question asked for all partitions; here, you want to generate the K-partitions. As indicated, the algorithms are very similar.