I would like to be able to index elements of a power set without expanding the full set into memory (a la itertools)
Furthermore I want the index to be cardinality order
I think you can do this with a two step process. The first step is as Mihai Maruseac described in his (now deleted) answer, to find the size of the set by iterating over the possible sizes until you find the appropriate one. Here's code for that:
def find_size(n, i):
"""Return a tuple, (k, i), where s is the size of the i-1'th set in the
cardinally-ordered powerset of {0..n-1}, and i is the remaining index
within the combinations of that size."""
if not 0 <= i < 2**n:
raise ValueError('index is too large or small')
for k in range(n+1):
c = comb(n, k)
if c > i:
return k, i
else:
i -= c
Once you have determined the size, you can use the combinatorial number system to find the right k-combination from the lexicographical ordering:
def pick_set(n, i):
"""Return the i-1'th set in the cardinally-ordered powerset of {0..n-1}"""
s, i = find_size(n, i)
result = []
for k in range(s, 0, -1):
prev_c = 0
for v in range(k, n+1):
c = comb(v, k)
if i < c:
result.append(v-1)
i -= prev_c
break
prev_c = c
return tuple(result)
Both of those functions require a function to calculate the number of k-combinations for a set of size n, nCk (which I've called comb
). This other question has several suggested solutions for finding that value, including scipy.misc.comb
, gmpy.comb
and a few pure-python implementations. Or, since it's called repeatedly with sequentially increasing values (e.g. comb(n, 0)
, comb(n, 1)
, etc. or comb(k, k)
, comb(k+1, k)
, etc.) you could instead use an inline calculation that takes advantage the previously calculated value to give better performance.
Example usage (using a comb
function minimally adapted from J.F. Sebastian's answer in the question linked above):
>>> for i in range(2**4):
print(i, pick_set(4, i))
0 ()
1 (0,)
2 (1,)
3 (2,)
4 (3,)
5 (1, 0)
6 (2, 0)
7 (2, 1)
8 (3, 0)
9 (3, 1)
10 (3, 2)
11 (2, 1, 0)
12 (3, 1, 0)
13 (3, 2, 0)
14 (3, 2, 1)
15 (3, 2, 1, 0)
Note that if you plan on iterating over combinations (as I did in the example), you can probably do so more efficiently than by running the full algorithm, as there are more efficient algorithms for finding the next combination of a given size (though you'll need a bit of extra logic to bump up to the next larger size of combinations when you've exhausted the initial size).
Edit: Here are implementations of some of the optimizations I mentioned briefly above:
First off, generators that efficiently calculate combination values for ranges of n
or k
values:
def comb_n_range(start_n, stop_n, k):
c = comb(start_n, k)
yield start_n, c
for n in range(start_n+1, stop_n):
c = c * n // (n - k)
yield n, c
def comb_k_range(n, start_k, end_k):
c = comb(n, start_k)
yield start_k, c
for k in range(start_k+1, end_k):
c = c * (n - k + 1) // k
yield k, c
The for ... in range(...): c = comb(...); ...
bits in the code above can be adjusted to use these, which should be a bit faster.
Next, a function that returns the next combination in lexicographical order:
def next_combination(n, c):
if c[-1] == n-len(c)+1:
raise ValueError("no more combinations")
for i in range(len(c)-1, -1, -1):
if i == 0 or c[i] < c[i-1] - 1:
return c[:i] + (c[i] + 1,) + tuple(range(len(c)-2-i,-1,-1))
And a generator that uses next_combination
to yield a range of values from the powerset, defined by a slice
object:
def powerset_slice(n, s):
start, stop, step = s.indices(2**n)
if step < 1:
raise ValueError("invalid step size (must be positive)")
if start == 0:
c = ()
else:
c = pick_set(n, start)
for _ in range(start, stop, step):
yield c
for _ in range(step):
try:
c = next_combination(n, c)
except ValueError:
if len(c) == n:
return
c = tuple(range(len(c), -1, -1))
You could integrate this into the class you are using by making __getitem__
return the generator if it is passed a slice
object, rather than an int
. This would let you make __iter__
faster by simply turning its body into: return self[:]
.