Index into size ordered power set

后端 未结 1 1450
忘掉有多难
忘掉有多难 2021-02-06 00:41

I would like to be able to index elements of a power set without expanding the full set into memory (a la itertools)

Furthermore I want the index to be cardinality order

相关标签:
1条回答
  • 2021-02-06 00:51

    I think you can do this with a two step process. The first step is as Mihai Maruseac described in his (now deleted) answer, to find the size of the set by iterating over the possible sizes until you find the appropriate one. Here's code for that:

    def find_size(n, i):
        """Return a tuple, (k, i), where s is the size of the i-1'th set in the
           cardinally-ordered powerset of {0..n-1}, and i is the remaining index
           within the combinations of that size."""
        if not 0 <= i < 2**n:
            raise ValueError('index is too large or small')
        for k in range(n+1):
            c = comb(n, k)
            if c > i:
                return k, i
            else:
                i -= c
    

    Once you have determined the size, you can use the combinatorial number system to find the right k-combination from the lexicographical ordering:

    def pick_set(n, i):
        """Return the i-1'th set in the cardinally-ordered powerset of {0..n-1}"""
        s, i = find_size(n, i)
        result = []
        for k in range(s, 0, -1):
            prev_c = 0
            for v in range(k, n+1):
                c = comb(v, k)
                if i < c:
                    result.append(v-1)
                    i -= prev_c
                    break
                prev_c = c
        return tuple(result)
    

    Both of those functions require a function to calculate the number of k-combinations for a set of size n, nCk (which I've called comb). This other question has several suggested solutions for finding that value, including scipy.misc.comb, gmpy.comb and a few pure-python implementations. Or, since it's called repeatedly with sequentially increasing values (e.g. comb(n, 0), comb(n, 1), etc. or comb(k, k), comb(k+1, k), etc.) you could instead use an inline calculation that takes advantage the previously calculated value to give better performance.

    Example usage (using a comb function minimally adapted from J.F. Sebastian's answer in the question linked above):

    >>> for i in range(2**4):
            print(i, pick_set(4, i))
    
    0 ()
    1 (0,)
    2 (1,)
    3 (2,)
    4 (3,)
    5 (1, 0)
    6 (2, 0)
    7 (2, 1)
    8 (3, 0)
    9 (3, 1)
    10 (3, 2)
    11 (2, 1, 0)
    12 (3, 1, 0)
    13 (3, 2, 0)
    14 (3, 2, 1)
    15 (3, 2, 1, 0)
    

    Note that if you plan on iterating over combinations (as I did in the example), you can probably do so more efficiently than by running the full algorithm, as there are more efficient algorithms for finding the next combination of a given size (though you'll need a bit of extra logic to bump up to the next larger size of combinations when you've exhausted the initial size).

    Edit: Here are implementations of some of the optimizations I mentioned briefly above:

    First off, generators that efficiently calculate combination values for ranges of n or k values:

    def comb_n_range(start_n, stop_n, k):
        c = comb(start_n, k)
        yield start_n, c
        for n in range(start_n+1, stop_n):
            c = c * n // (n - k)
            yield n, c
    
    def comb_k_range(n, start_k, end_k):
        c = comb(n, start_k)
        yield start_k, c
        for k in range(start_k+1, end_k):
            c = c * (n - k + 1) // k
            yield k, c
    

    The for ... in range(...): c = comb(...); ... bits in the code above can be adjusted to use these, which should be a bit faster.

    Next, a function that returns the next combination in lexicographical order:

    def next_combination(n, c):
        if c[-1] == n-len(c)+1:
            raise ValueError("no more combinations")
        for i in range(len(c)-1, -1, -1):
            if i == 0 or c[i] < c[i-1] - 1:
                return c[:i] + (c[i] + 1,) + tuple(range(len(c)-2-i,-1,-1))
    

    And a generator that uses next_combination to yield a range of values from the powerset, defined by a slice object:

    def powerset_slice(n, s):
        start, stop, step = s.indices(2**n)
        if step < 1:
            raise ValueError("invalid step size (must be positive)")
    
        if start == 0:
            c = ()
        else:
            c = pick_set(n, start)
    
        for _ in range(start, stop, step):
            yield c
            for _ in range(step):
                try:
                    c = next_combination(n, c)
                except ValueError:
                    if len(c) == n:
                        return
                    c = tuple(range(len(c), -1, -1))
    

    You could integrate this into the class you are using by making __getitem__ return the generator if it is passed a slice object, rather than an int. This would let you make __iter__ faster by simply turning its body into: return self[:].

    0 讨论(0)
提交回复
热议问题