How to get all subsets of a set? (powerset)

前端 未结 28 2507
庸人自扰
庸人自扰 2020-11-22 05:18

Given a set

{0, 1, 2, 3}

How can I produce the subsets:

[set(),
 {0},
 {1},
 {2},
 {3},
 {0, 1},
 {0, 2},
 {0, 3},
 {1, 2}         


        
相关标签:
28条回答
  • 2020-11-22 05:58

    I hadn't come across the more_itertools.powerset function and would recommend using that. I also recommend not using the default ordering of the output from itertools.combinations, often instead you want to minimise the distance between the positions and sort the subsets of items with shorter distance between them above/before the items with larger distance between them.

    The itertools recipes page shows it uses chain.from_iterable

    • Note that the r here matches the standard notation for the lower part of a binomial coefficient, the s is usually referred to as n in mathematics texts and on calculators (“n Choose r”)
    def powerset(iterable):
        "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
        s = list(iterable)
        return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))
    

    The other examples here give the powerset of [1,2,3,4] in such a way that the 2-tuples are listed in "lexicographic" order (when we print the numbers as integers). If I write the distance between the numbers alongside it (i.e. the difference), it shows my point:

    12 ⇒ 1
    13 ⇒ 2
    14 ⇒ 3
    23 ⇒ 1
    24 ⇒ 2
    34 ⇒ 1
    

    The correct order for subsets should be the order which 'exhausts' the minimal distance first, like so:

    12 ⇒ 1
    23 ⇒ 1
    34 ⇒ 1
    13 ⇒ 2
    24 ⇒ 2
    14 ⇒ 3
    

    Using numbers here makes this ordering look 'wrong', but consider for example the letters ["a","b","c","d"] it is clearer why this might be useful to obtain the powerset in this order:

    ab ⇒ 1
    bc ⇒ 1
    cd ⇒ 1
    ac ⇒ 2
    bd ⇒ 2
    ad ⇒ 3
    

    This effect is more pronounced with more items, and for my purposes it makes the difference between being able to describe the ranges of the indexes of the powerset meaningfully.

    (There is a lot written on Gray codes etc. for the output order of algorithms in combinatorics, I don't see it as a side issue).

    I actually just wrote a fairly involved program which used this fast integer partition code to output the values in the proper order, but then I discovered more_itertools.powerset and for most uses it's probably fine to just use that function like so:

    from more_itertools import powerset
    from numpy import ediff1d
    
    def ps_sorter(tup):
        l = len(tup)
        d = ediff1d(tup).tolist()
        return l, d
    
    ps = powerset([1,2,3,4])
    
    ps = sorted(ps, key=ps_sorter)
    
    for x in ps:
        print(x)
    

    ()
    (1,)
    (2,)
    (3,)
    (4,)
    (1, 2)
    (2, 3)
    (3, 4)
    (1, 3)
    (2, 4)
    (1, 4)
    (1, 2, 3)
    (2, 3, 4)
    (1, 2, 4)
    (1, 3, 4)
    (1, 2, 3, 4)
    

    I wrote some more involved code which will print the powerset nicely (see the repo for pretty printing functions I've not included here: print_partitions, print_partitions_by_length, and pprint_tuple).

    • Repo: ordered-powerset, specifically pset_partitions.py

    This is all pretty simple, but still might be useful if you want some code that'll let you get straight to accessing the different levels of the powerset:

    from itertools import permutations as permute
    from numpy import cumsum
    
    # http://jeromekelleher.net/generating-integer-partitions.html
    # via
    # https://stackoverflow.com/questions/10035752/elegant-python-code-for-integer-partitioning#comment25080713_10036764
    
    def asc_int_partitions(n):
        a = [0 for i in range(n + 1)]
        k = 1
        y = n - 1
        while k != 0:
            x = a[k - 1] + 1
            k -= 1
            while 2 * x <= y:
                a[k] = x
                y -= x
                k += 1
            l = k + 1
            while x <= y:
                a[k] = x
                a[l] = y
                yield tuple(a[:k + 2])
                x += 1
                y -= 1
            a[k] = x + y
            y = x + y - 1
            yield tuple(a[:k + 1])
    
    # https://stackoverflow.com/a/6285330/2668831
    def uniquely_permute(iterable, enforce_sort=False, r=None):
        previous = tuple()
        if enforce_sort: # potential waste of effort (default: False)
            iterable = sorted(iterable)
        for p in permute(iterable, r):
            if p > previous:
                previous = p
                yield p
    
    def sum_min(p):
        return sum(p), min(p)
    
    def partitions_by_length(max_n, sorting=True, permuting=False):
        partition_dict = {0: ()}
        for n in range(1,max_n+1):
            partition_dict.setdefault(n, [])
            partitions = list(asc_int_partitions(n))
            for p in partitions:
                if permuting:
                    perms = uniquely_permute(p)
                    for perm in perms:
                        partition_dict.get(len(p)).append(perm)
                else:
                    partition_dict.get(len(p)).append(p)
        if not sorting:
            return partition_dict
        for k in partition_dict:
            partition_dict.update({k: sorted(partition_dict.get(k), key=sum_min)})
        return partition_dict
    
    def print_partitions_by_length(max_n, sorting=True, permuting=True):
        partition_dict = partitions_by_length(max_n, sorting=sorting, permuting=permuting)
        for k in partition_dict:
            if k == 0:
                print(tuple(partition_dict.get(k)), end="")
            for p in partition_dict.get(k):
                print(pprint_tuple(p), end=" ")
            print()
        return
    
    def generate_powerset(items, subset_handler=tuple, verbose=False):
        """
        Generate the powerset of an iterable `items`.
    
        Handling of the elements of the iterable is by whichever function is passed as
        `subset_handler`, which must be able to handle the `None` value for the
        empty set. The function `string_handler` will join the elements of the subset
        with the empty string (useful when `items` is an iterable of `str` variables).
        """
        ps = {0: [subset_handler()]}
        n = len(items)
        p_dict = partitions_by_length(n-1, sorting=True, permuting=True)
        for p_len, parts in p_dict.items():
            ps.setdefault(p_len, [])
            if p_len == 0:
                # singletons
                for offset in range(n):
                    subset = subset_handler([items[offset]])
                    if verbose:
                        if offset > 0:
                            print(end=" ")
                        if offset == n - 1:
                            print(subset, end="\n")
                        else:
                            print(subset, end=",")
                    ps.get(p_len).append(subset)
            for pcount, partition in enumerate(parts):
                distance = sum(partition)
                indices = (cumsum(partition)).tolist()
                for offset in range(n - distance):
                    subset = subset_handler([items[offset]] + [items[offset:][i] for i in indices])
                    if verbose:
                        if offset > 0:
                            print(end=" ")
                        if offset == n - distance - 1:
                            print(subset, end="\n")
                        else:
                            print(subset, end=",")
                    ps.get(p_len).append(subset)
            if verbose and p_len < n-1:
                print()
        return ps
    

    As an example, I wrote a CLI demo program which takes a string as a command line argument:

    python string_powerset.py abcdef
    

    a, b, c, d, e, f
    
    ab, bc, cd, de, ef
    ac, bd, ce, df
    ad, be, cf
    ae, bf
    af
    
    abc, bcd, cde, def
    abd, bce, cdf
    acd, bde, cef
    abe, bcf
    ade, bef
    ace, bdf
    abf
    aef
    acf
    adf
    
    abcd, bcde, cdef
    abce, bcdf
    abde, bcef
    acde, bdef
    abcf
    abef
    adef
    abdf
    acdf
    acef
    
    abcde, bcdef
    abcdf
    abcef
    abdef
    acdef
    
    abcdef
    
    0 讨论(0)
  • 2020-11-22 05:59

    This can be done very naturally with itertools.product:

    import itertools
    
    def powerset(l):
        for sl in itertools.product(*[[[], [i]] for i in l]):
            yield {j for i in sl for j in i}
    
    0 讨论(0)
  • 2020-11-22 05:59

    With empty set, which is part of all the subsets, you could use:

    def subsets(iterable):
        for n in range(len(iterable) + 1):
            yield from combinations(iterable, n)
    
    0 讨论(0)
  • 2020-11-22 05:59

    This is wild because none of these answers actually provide the return of an actual Python set. Here is a messy implementation that will give a powerset that actually is a Python set.

    test_set = set(['yo', 'whatup', 'money'])
    def powerset( base_set ):
        """ modified from pydoc's itertools recipe shown above"""
        from itertools import chain, combinations
        base_list = list( base_set )
        combo_list = [ combinations(base_list, r) for r in range(len(base_set)+1) ]
    
        powerset = set([])
        for ll in combo_list:
            list_of_frozensets = list( map( frozenset, map( list, ll ) ) ) 
            set_of_frozensets = set( list_of_frozensets )
            powerset = powerset.union( set_of_frozensets )
    
        return powerset
    
    print powerset( test_set )
    # >>> set([ frozenset(['money','whatup']), frozenset(['money','whatup','yo']), 
    #        frozenset(['whatup']), frozenset(['whatup','yo']), frozenset(['yo']),
    #        frozenset(['money','yo']), frozenset(['money']), frozenset([]) ])
    

    I'd love to see a better implementation, though.

    0 讨论(0)
提交回复
热议问题