efficient powerset algorithm for subsets of minimal length

痞子三分冷 提交于 2019-12-07 12:46:36

问题


i am using the following C# function to get a powerset limited to subsets of a minimal length

string[] PowerSet(int min_len, string set)
{
    IEnumerable<IEnumerable<string>> seed = 
                    new List<IEnumerable<string>>() { Enumerable.Empty<string>() };

    return set.Replace(" ", "")
              .Split(',')
              .Aggregate(seed, (a, b) => a.Concat(a.Select(x => x.Concat(new[] { b }))))
              .Where(subset => subset.Count() >= min_len)
              .Select(subset => string.Join(",", subset))
              .ToArray();
}

the problem is that when the original set is large, the algorithm has to work very hard even if the minimal length is large as well.

e.g:

    PowerSet(27, "1,11,12,17,22,127,128,135,240,254,277,284,292,296,399,309,322,326,333,439,440,442,447,567,580,590,692,697");

should be very easy, but is too lengthily for the above function. i am looking for a concise modification of my function which could efficiently handle these cases.


回答1:


Taking a quick look at your method, one of the inefficiencies is that every possible subset is created, regardless of whether it has enough members to warrant inclusion in the limited super set.

Consider implementing the following extension method instead. This method can trim out some unnecessary subsets based on their count to avoid excess computation.

public static List<List<T>> PowerSet<T>(List<T> startingSet, int minSubsetSize)
{
    List<List<T>> subsetList = new List<List<T>>();

    //The set bits of each intermediate value represent unique 
    //combinations from the startingSet.
    //We can start checking for combinations at (1<<minSubsetSize)-1 since
    //values less than that will not yield large enough subsets.
    int iLimit = 1 << startingSet.Count;
    for (int i = (1 << minSubsetSize)-1; i < iLimit; i++)
    {
        //Get the number of 1's in this 'i'
        int setBitCount = NumberOfSetBits(i);

        //Only include this subset if it will have at least minSubsetSize members.
        if (setBitCount >= minSubsetSize)
        {
            List<T> subset = new List<T>(setBitCount);

            for (int j = 0; j < startingSet.Count; j++)
            {
                //If the j'th bit in i is set, 
                //then add the j'th element of the startingSet to this subset.
                if ((i & (1 << j)) != 0)
                {
                    subset.Add(startingSet[j]);
                }
            }
            subsetList.Add(subset);
        }
    }
    return subsetList;
}

The number of set bits in each incremental i tells you how many members will be in the subset. If there are not enough set bits, then there is no point in doing the work of creating the subset represented by the bit combination. NumberOfSetBits can be implemented a number of ways. See How to count the number of set bits in a 32-bit integer? for various approaches, explanations and references. Here is one example taken from that SO question.

public static int NumberOfSetBits(int i)
{
    i = i - ((i >> 1) & 0x55555555);
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
    return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}

Now, while this solution works for your example, I think you will run into long runtimes and memory issues if you lower the minimum subset size too far or continue to grow the size of the startingSet. Without specific requirements posted in your question, I can't judge if this solution will work for you and/or is safe for your range of expected input cases.

If you find that this solution is still too slow, the operations can be split up for parallel computation, perhaps using PLINQ features.

Lastly, if you would like to dress up the extension method with LINQ, it would look like the following. However, as written, I think you will see slower performance without some changes to it.

public static IEnumerable<List<T>> PowerSet<T>(List<T> startingSet, int minSubsetSize)
{
    var startingSetIndexes = Enumerable.Range(0, startingSet.Count).ToList();

    var candidates = Enumerable.Range((1 << minSubsetSize)-1, 1 << startingSet.Count)
                               .Where(p => NumberOfSetBits(p) >= minSubsetSize)
                               .ToList();

    foreach (int p in candidates)
    {
        yield return startingSetIndexes.Where(setInd => (p & (1 << setInd)) != 0)
                                       .Select(setInd => startingSet[setInd])
                                       .ToList();
    }
}


来源:https://stackoverflow.com/questions/9651987/efficient-powerset-algorithm-for-subsets-of-minimal-length

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!