问题
I would like to be able to index elements of a power set without expanding the full set into memory (a la itertools)
Furthermore I want the index to be cardinality ordered. So index 0 should be the empty set, index 2**n - 1 should be all elements
Most literature I have found so far involves generating a power set inductively. It doesn't let you just dive in at any index. My motivation for this indexing is to slice a problem for distributed execution and it be helpful if a remote machine can just dive in anywhere without sharing an iterator reference across a cluster.
EDIT: Blckknght suggested the solution I pursued which is shown below
from scipy.misc import comb
def kcombination_to_index(combination):
index = 0
combination = sorted(combination)
for k, ck in enumerate(combination):
index += comb(ck, k+1, exact=True)
return index
def index_to_kcombination(index, k):
result = []
for k in reversed(range(1, k+1)):
n = 0
while comb(n, k, exact=True) <= index:
n +=1
result.append(n-1)
index -= comb(n-1, k, exact=True)
return result
class PowerSet:
def __init__(self, elements):
self.elements = elements
def __len__(self):
return 2 ** len(self.elements)
def __iter__(self):
for i in range(len(self)):
yield self[i]
def __getitem__(self, k):
if not isinstance(k, int):
raise TypeError
#k=0 is empty set,
#k= 1 - 1+n returns subsets of size 1
for subset_size in range(len(self.elements) + 1):
number_subsets = comb(len(self.elements), subset_size, exact=True)
if k >= number_subsets:
k -= number_subsets
else:
break
#we now want the kth element of a possible permutation of subset_size elements
indeces = index_to_kcombination(k, subset_size)
return map(lambda i: self.elements[i], indeces)
if __name__ == "__main__":
print "index of combination [8, 6, 3, 1, 0] is", kcombination_to_index([8, 6, 3, 1, 0])
print "5 combination at position 72 is", index_to_kcombination(72,5)
ps = PowerSet(["a", "b", "c", "d"])
for subset_idx in range(len(ps)):
print ps[subset_idx]
回答1:
I think you can do this with a two step process. The first step is as Mihai Maruseac described in his (now deleted) answer, to find the size of the set by iterating over the possible sizes until you find the appropriate one. Here's code for that:
def find_size(n, i):
"""Return a tuple, (k, i), where s is the size of the i-1'th set in the
cardinally-ordered powerset of {0..n-1}, and i is the remaining index
within the combinations of that size."""
if not 0 <= i < 2**n:
raise ValueError('index is too large or small')
for k in range(n+1):
c = comb(n, k)
if c > i:
return k, i
else:
i -= c
Once you have determined the size, you can use the combinatorial number system to find the right k-combination from the lexicographical ordering:
def pick_set(n, i):
"""Return the i-1'th set in the cardinally-ordered powerset of {0..n-1}"""
s, i = find_size(n, i)
result = []
for k in range(s, 0, -1):
prev_c = 0
for v in range(k, n+1):
c = comb(v, k)
if i < c:
result.append(v-1)
i -= prev_c
break
prev_c = c
return tuple(result)
Both of those functions require a function to calculate the number of k-combinations for a set of size n, nCk (which I've called comb
). This other question has several suggested solutions for finding that value, including scipy.misc.comb
, gmpy.comb
and a few pure-python implementations. Or, since it's called repeatedly with sequentially increasing values (e.g. comb(n, 0)
, comb(n, 1)
, etc. or comb(k, k)
, comb(k+1, k)
, etc.) you could instead use an inline calculation that takes advantage the previously calculated value to give better performance.
Example usage (using a comb
function minimally adapted from J.F. Sebastian's answer in the question linked above):
>>> for i in range(2**4):
print(i, pick_set(4, i))
0 ()
1 (0,)
2 (1,)
3 (2,)
4 (3,)
5 (1, 0)
6 (2, 0)
7 (2, 1)
8 (3, 0)
9 (3, 1)
10 (3, 2)
11 (2, 1, 0)
12 (3, 1, 0)
13 (3, 2, 0)
14 (3, 2, 1)
15 (3, 2, 1, 0)
Note that if you plan on iterating over combinations (as I did in the example), you can probably do so more efficiently than by running the full algorithm, as there are more efficient algorithms for finding the next combination of a given size (though you'll need a bit of extra logic to bump up to the next larger size of combinations when you've exhausted the initial size).
Edit: Here are implementations of some of the optimizations I mentioned briefly above:
First off, generators that efficiently calculate combination values for ranges of n
or k
values:
def comb_n_range(start_n, stop_n, k):
c = comb(start_n, k)
yield start_n, c
for n in range(start_n+1, stop_n):
c = c * n // (n - k)
yield n, c
def comb_k_range(n, start_k, end_k):
c = comb(n, start_k)
yield start_k, c
for k in range(start_k+1, end_k):
c = c * (n - k + 1) // k
yield k, c
The for ... in range(...): c = comb(...); ...
bits in the code above can be adjusted to use these, which should be a bit faster.
Next, a function that returns the next combination in lexicographical order:
def next_combination(n, c):
if c[-1] == n-len(c)+1:
raise ValueError("no more combinations")
for i in range(len(c)-1, -1, -1):
if i == 0 or c[i] < c[i-1] - 1:
return c[:i] + (c[i] + 1,) + tuple(range(len(c)-2-i,-1,-1))
And a generator that uses next_combination
to yield a range of values from the powerset, defined by a slice
object:
def powerset_slice(n, s):
start, stop, step = s.indices(2**n)
if step < 1:
raise ValueError("invalid step size (must be positive)")
if start == 0:
c = ()
else:
c = pick_set(n, start)
for _ in range(start, stop, step):
yield c
for _ in range(step):
try:
c = next_combination(n, c)
except ValueError:
if len(c) == n:
return
c = tuple(range(len(c), -1, -1))
You could integrate this into the class you are using by making __getitem__
return the generator if it is passed a slice
object, rather than an int
. This would let you make __iter__
faster by simply turning its body into: return self[:]
.
来源:https://stackoverflow.com/questions/18966230/index-into-size-ordered-power-set