问题
i am using the following C# function to get a powerset limited to subsets of a minimal length
string[] PowerSet(int min_len, string set)
{
IEnumerable<IEnumerable<string>> seed =
new List<IEnumerable<string>>() { Enumerable.Empty<string>() };
return set.Replace(" ", "")
.Split(',')
.Aggregate(seed, (a, b) => a.Concat(a.Select(x => x.Concat(new[] { b }))))
.Where(subset => subset.Count() >= min_len)
.Select(subset => string.Join(",", subset))
.ToArray();
}
the problem is that when the original set is large, the algorithm has to work very hard even if the minimal length is large as well.
e.g:
PowerSet(27, "1,11,12,17,22,127,128,135,240,254,277,284,292,296,399,309,322,326,333,439,440,442,447,567,580,590,692,697");
should be very easy, but is too lengthily for the above function. i am looking for a concise modification of my function which could efficiently handle these cases.
回答1:
Taking a quick look at your method, one of the inefficiencies is that every possible subset is created, regardless of whether it has enough members to warrant inclusion in the limited super set.
Consider implementing the following extension method instead. This method can trim out some unnecessary subsets based on their count to avoid excess computation.
public static List<List<T>> PowerSet<T>(List<T> startingSet, int minSubsetSize)
{
List<List<T>> subsetList = new List<List<T>>();
//The set bits of each intermediate value represent unique
//combinations from the startingSet.
//We can start checking for combinations at (1<<minSubsetSize)-1 since
//values less than that will not yield large enough subsets.
int iLimit = 1 << startingSet.Count;
for (int i = (1 << minSubsetSize)-1; i < iLimit; i++)
{
//Get the number of 1's in this 'i'
int setBitCount = NumberOfSetBits(i);
//Only include this subset if it will have at least minSubsetSize members.
if (setBitCount >= minSubsetSize)
{
List<T> subset = new List<T>(setBitCount);
for (int j = 0; j < startingSet.Count; j++)
{
//If the j'th bit in i is set,
//then add the j'th element of the startingSet to this subset.
if ((i & (1 << j)) != 0)
{
subset.Add(startingSet[j]);
}
}
subsetList.Add(subset);
}
}
return subsetList;
}
The number of set bits in each incremental i
tells you how many members will be in the subset. If there are not enough set bits, then there is no point in doing the work of creating the subset represented by the bit combination. NumberOfSetBits
can be implemented a number of ways. See How to count the number of set bits in a 32-bit integer? for various approaches, explanations and references. Here is one example taken from that SO question.
public static int NumberOfSetBits(int i)
{
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
Now, while this solution works for your example, I think you will run into long runtimes and memory issues if you lower the minimum subset size too far or continue to grow the size of the startingSet
. Without specific requirements posted in your question, I can't judge if this solution will work for you and/or is safe for your range of expected input cases.
If you find that this solution is still too slow, the operations can be split up for parallel computation, perhaps using PLINQ features.
Lastly, if you would like to dress up the extension method with LINQ, it would look like the following. However, as written, I think you will see slower performance without some changes to it.
public static IEnumerable<List<T>> PowerSet<T>(List<T> startingSet, int minSubsetSize)
{
var startingSetIndexes = Enumerable.Range(0, startingSet.Count).ToList();
var candidates = Enumerable.Range((1 << minSubsetSize)-1, 1 << startingSet.Count)
.Where(p => NumberOfSetBits(p) >= minSubsetSize)
.ToList();
foreach (int p in candidates)
{
yield return startingSetIndexes.Where(setInd => (p & (1 << setInd)) != 0)
.Select(setInd => startingSet[setInd])
.ToList();
}
}
来源:https://stackoverflow.com/questions/9651987/efficient-powerset-algorithm-for-subsets-of-minimal-length