N choose N/2 sublists of a list

馋奶兔 提交于 2020-01-11 06:36:06

问题


Is there an efficient way in Python to get all partitions of a list of size n into two subsets of size n/2? I want to get some iterative construct such that each iteration provides two non-overlapping subsets of the original list, each subset having size n/2.

For example:

A = [1,2,3,4,5,6]    # here n = 6
# some iterative construct
    # in each iteration, a pair of subsets of size n/2
    # subsets = [[1,3,4], [2,5,6]] for example for one of the iterations
    # subsets = [[1,2,5],[3,4,6]] a different iteration example

The subsets should be non-overlapping, e.g. [[1,2,3], [4,5,6]] is valid but [[1,2,3], [3,4,5]] is not. The order of the two subsets does not matter, e.g. [[1,2,3], [4,5,6]] does not count as different from [[4,5,6], [1,2,3]] and thus only one of those two should appear in an iteration. The order within each subset also does not matter, so [[1,2,3], [4,5,6]], [[1,3,2], [4,5,6]], [[3,2,1], [6,5,4]], etc. all count as the same and so only one of them should show up in whole iteration.


回答1:


Here's an itertools-based generator that I think yields exactly the values you want.

def sub_lists(sequence):
    all_but_first = set(sequence[1:])
    for item in itertools.combinations(sequence[1:], len(sequence)//2 - 1):
        yield [[sequence[0]] + list(item), list(all_but_first.difference(item))]

I avoid near-duplicate outputs in two ways as compared to a permutations based approach in Suever's answer. First, I avoid yielding both [["a", "b"], ["c", "d"]] and [["c", "d"], ["a", "b"]] by forcing all the results to have the first value of the input sequence in the first sublist. I avoid yielding [["a", "b"], ["c", "d"]] and [["a", "b"], ["d", "c"]] by building the second sublist using set-subtraction.

Note that yielding nested tuples might be a little more natural than nested lists. To do that, just change the last line to:

yield (sequence[0],) + item, tuple(all_but_first.difference(item))



回答2:


You will want to use itertools.combinations to do this. The inputs are the list you want to select items out of and the second is the number of items to select.

result = [list(item) for item in itertools.combinations(input, len(input) // 2)]

For an input of [1,2,3,4] this yields

[[1, 2], [1, 3], [1, 4], [2, 3], [2, 4], [3, 4]]

As @ShadowRanger pointed out, if order matters in your lists and you want all permutations, you'll want to substitute itertools.permutations into the solution.

result = [list(item) for item in itertools.permutations(input, len(input) // 2)]
# [[1, 2], [1, 3], [1, 4], [2, 1], [2, 3], [2, 4], [3, 1], [3, 2], [3, 4], [4, 1], [4, 2], [4, 3]]

Edit

Upon reading your question closer it is unclear if you want all n/2 permutations like I have shown or you want a list of lits where each element is yet another list of the two "halves" of the permutation.

To accomplish this, you could do the following (incorporating some indexing help from @Blckknght)

result = [[list(item[::2]), list(item[1::2])] for item in itertools.permutations(input)]

In this case, the output of [1,2,3,4] would be

[[[1, 3], [2, 4]], [[1, 4], [2, 3]], [[1, 2], [3, 4]], [[1, 4], [3, 2]], [[1, 2], [4, 3]], [[1, 3], [4, 2]], [[2, 3], [1, 4]], [[2, 4], [1, 3]], [[2, 1], [3, 4]], [[2, 4], [3, 1]], [[2, 1], [4, 3]], [[2, 3], [4, 1]], [[3, 2], [1, 4]], [[3, 4], [1, 2]], [[3, 1], [2, 4]], [[3, 4], [2, 1]], [[3, 1], [4, 2]], [[3, 2], [4, 1]], [[4, 2], [1, 3]], [[4, 3], [1, 2]], [[4, 1], [2, 3]], [[4, 3], [2, 1]], [[4, 1], [3, 2]], [[4, 2], [3, 1]]]

Edit2

Since order doesn't matter but you want an approach similar to the last one (lists of lists of lists), that's a little tricky with the last approach because of the array slicing. One alternative is to use set and frozenset to construct the initial information (rather than lists), because in a set the ordering doesn't matter when checking for equality. This will automatically allow us to remove duplicates. We can then add an extra step to convert back to a list if that's what you prefer.

from itertools import permutations
tmp = set([frozenset([frozenset(k[::2]),frozenset(k[1::2])]) for k in permutations(input)]) 
result = [[list(el) for el in item] for item in tmp];

This will yield

[[[1, 2], [3, 4]], [[2, 3], [1, 4]], [[1, 3], [2, 4]]]



回答3:


Here's a solution which doesn't use itertools. It uses a trick called Gosper's hack to generate bit permutations. See HAKMEM Item 175 for an explanation of how it works; this hack is also mentioned in the Wikipedia article Combinatorial number system. And it features in the accepted answer to this SO question: Iterating over all subsets of a given size.

The parts function is a generator, so you can use it in a for loop, as illustrated in my test.

How it works.

To partiton a list of length n into pairs of sublists of length n/2 we use a binary number bits consisting of n/2 zero bits and n/2 one bits. A zero bit in a given position indicates that the corresponding list element goes into the left sublist, a one bit in a given position indicates that the corresponding list element goes into the right sublist.

Initially, bits is set to 2 ** (n/2) - 1, so if n = 6, bits starts out as 000111.

The generator uses Gosper's hack to permute bits in numerical order, stopping when we get a one bit in the highest position, since that's when we start getting the reversed versions of our sublist pairs.

The code responsible for converting the pattern in bit into the pair of sublists is:

    for i, u in enumerate(lst):
        ss[bits & (1<<i) == 0].append(u)

If there's a zero at bit position i in bits then ss[0] gets the current item from lst, otherwise it's appended to ss[1].

This code runs on Python 2 and Python 3.

from __future__ import print_function

def parts(lst):
    ''' Generate all pairs of equal-sized partitions of a list of even length '''

    n = len(lst)
    if n % 2 != 0:
        raise ValueError('list length MUST be even')

    lim = 1 << (n - 1)
    bits = (1 << n // 2) - 1

    while bits < lim:
        #Use bits to partition lst
        ss = [[], []]
        for i, u in enumerate(lst):
            ss[bits & (1<<i) == 0].append(u)
        yield ss

        #Calculate next bits permutation via Gosper's hack (HAKMEM #175)
        u = bits & (-bits)
        v = bits + u
        bits = v | (((v ^ bits) // u) >> 2)

# Test
lst = list(range(1, 7))
for i, t in enumerate(parts(lst), 1):
    print('{0:2d}: {1}'.format(i, t))    

output

 1: [[1, 2, 3], [4, 5, 6]]
 2: [[1, 2, 4], [3, 5, 6]]
 3: [[1, 3, 4], [2, 5, 6]]
 4: [[2, 3, 4], [1, 5, 6]]
 5: [[1, 2, 5], [3, 4, 6]]
 6: [[1, 3, 5], [2, 4, 6]]
 7: [[2, 3, 5], [1, 4, 6]]
 8: [[1, 4, 5], [2, 3, 6]]
 9: [[2, 4, 5], [1, 3, 6]]
10: [[3, 4, 5], [1, 2, 6]]

I admit that using something inscrutable like Gosper's hack isn't exactly Pythonic. :)


Here's how you capture the output of parts into a list of all the sublists. It also illustrates that parts can handle string input, although it produces the output as lists of strings.

seq = list(parts('abcd'))
print(seq)

output

[[['a', 'b'], ['c', 'd']], [['a', 'c'], ['b', 'd']], [['b', 'c'], ['a', 'd']]] 



Here's another solution, using itertools to generate the combinations. It generates the pairs in a different order to the earlier version. However, it's shorter and easier to read. More importantly, it's significantly faster, between 50 to 100 percent faster in my timeit tests, depending on the list length; the difference appears to get smaller for longer lists.

def parts(lst):
    n = len(lst)
    if n % 2 != 0:
        raise ValueError('list length MUST be even')

    first = lst[0]
    for left in combinations(lst, n // 2):
        if left[0] != first:
            break
        right = [u for u in lst if u not in left]
        yield [list(left), right]

# Test
lst = list(range(1, 7))
for i, t in enumerate(parts(lst), 1):
    print('{0:2d}: {1}'.format(i, t))    

output

 1: [[1, 2, 3], [4, 5, 6]]
 2: [[1, 2, 4], [3, 5, 6]]
 3: [[1, 2, 5], [3, 4, 6]]
 4: [[1, 2, 6], [3, 4, 5]]
 5: [[1, 3, 4], [2, 5, 6]]
 6: [[1, 3, 5], [2, 4, 6]]
 7: [[1, 3, 6], [2, 4, 5]]
 8: [[1, 4, 5], [2, 3, 6]]
 9: [[1, 4, 6], [2, 3, 5]]



回答4:


Since none of the orders matter, but we're making a list of lists of lists (where order inherently matters), we can assume some invariants: in all pairs, the first element in the first pair is 1, and both lists in a pair are in sorted order.

A = [1,2,3,4,5,6]

from itertools import combinations
first, rest = A[0], A[1:]
result = [
            [
                list((first,) + X), 
                [x for x in rest if x not in X]
            ] 
            for X in combinations(rest, len(A)/2 - 1)
         ]



回答5:


Here's some code that performs timeit tests on the various solutions to this problem.

To make the comparisons fair, I've commented out the argument-checking test in my functions.

I've also added a function parts_combo_set that combines features of my combinations-based answer with that of Blckknght. It appears to be the fastest, except for very small lists.

#!/usr/bin/env python

''' Generate all pairs of equal-sized partitions of a list of even length

    Testing the speed of the code at
    http://stackoverflow.com/q/36025609/4014959

    Written by PM 2Ring 2016.03.16
'''

from __future__ import print_function, division
from itertools import combinations
from timeit import Timer

def parts_combo(lst):
    n = len(lst)
    #if n % 2 != 0:
        #raise ValueError('list length MUST be even')

    first = lst[0]
    for left in combinations(lst, n // 2):
        if left[0] != first:
            break
        right = [u for u in lst if u not in left]
        yield [list(left), right]

def parts_combo_set(lst):
    n = len(lst)
    #if n % 2 != 0:
        #raise ValueError('list length MUST be even')

    first = lst[0]
    allset = set(lst)
    for left in combinations(lst, n // 2):
        if left[0] != first:
            break
        yield [list(left), list(allset.difference(left))]

def parts_gosper(lst):
    n = len(lst)
    #if n % 2 != 0:
        #raise ValueError('list length MUST be even')

    lim = 1 << (n - 1)
    bits = (1 << n // 2) - 1

    while bits < lim:
        #Use bits to partition lst
        ss = [[], []]
        for i, u in enumerate(lst):
            ss[bits & (1<<i) == 0].append(u)
        yield ss

        #Calculate next bits permutation via Gosper's hack (HAKMEM #175)
        u = bits & (-bits)
        v = bits + u
        bits = v | (((v ^ bits) // u) >> 2)

def sub_lists(sequence):
    all_but_first = set(sequence[1:])
    for item in combinations(sequence[1:], len(sequence)//2 - 1):
        yield [[sequence[0]] + list(item), list(all_but_first.difference(item))]

def amit(seq):
    first, rest = seq[0], seq[1:]
    return [
                [
                    list((first,) + t), 
                    [x for x in rest if x not in t]
                ]
                for t in combinations(rest, len(seq) // 2 - 1)
           ]

funcs = (
    parts_combo,
    parts_combo_set,
    parts_gosper,
    sub_lists,
    amit,
)

def rset(seq):
    fset = frozenset
    return fset([fset([fset(u),fset(v)]) for u,v in seq])

def validate():
    func = funcs[0]
    master = rset(func(lst))
    print('\nValidating against', func.func_name)
    for func in funcs[1:]:
        print(func.func_name, rset(func(lst)) == master)

def time_test(loops, reps):
    ''' Print timing stats for all the functions '''
    for func in funcs:
        fname = func.func_name
        print('\n%s' % fname)
        setup = 'from __main__ import lst,' + fname
        #cmd = 'list(%s(lst))' % fname
        cmd = 'for t in %s(lst):pass' % fname
        t = Timer(cmd, setup)
        r = t.repeat(reps, loops)
        r.sort()
        print(r)


num = 6
lst = list(range(1, num + 1))
print('num =', num)

#parts = funcs[0]
#for i, t in enumerate(parts(lst), 1):
    #print('{0:2d}: {1}'.format(i, t))

validate()
time_test(10000, 3)

outputs

time_test(10000, 3)

num = 6

Validating against parts_combo
parts_combo_set True
parts_gosper True
sub_lists True
amit True

parts_combo
[0.58100390434265137, 0.58798313140869141, 0.59674692153930664]

parts_combo_set
[0.74442911148071289, 0.77211689949035645, 0.79338312149047852]

parts_gosper
[1.0791628360748291, 1.089813232421875, 1.1191768646240234]

sub_lists
[0.77199792861938477, 0.79007697105407715, 0.81944608688354492]

amit
[0.60080099105834961, 0.60345196723937988, 0.60417318344116211]

time_test(1000, 3)

num = 8

Validating against parts_combo
parts_combo_set True
parts_gosper True
sub_lists True
amit True

parts_combo
[0.22465801239013672, 0.22501206398010254, 0.23627114295959473]

parts_combo_set
[0.29469203948974609, 0.29857206344604492, 0.30069589614868164]

parts_gosper
[0.43568992614746094, 0.44395184516906738, 0.44651198387145996]

sub_lists
[0.31375885009765625, 0.32754802703857422, 0.37077498435974121]

amit
[0.22520613670349121, 0.22674393653869629, 0.24075913429260254]

time_test(500, 3)

num = 10

parts_combo
[0.52618098258972168, 0.52645611763000488, 0.53866386413574219]

parts_combo_set
[0.40614008903503418, 0.41370606422424316, 0.41525506973266602]

parts_gosper
[1.0068988800048828, 1.026188850402832, 1.1649439334869385]

sub_lists
[0.48507213592529297, 0.50991582870483398, 0.51528096199035645]

amit
[0.48686790466308594, 0.52898812294006348, 0.68387198448181152]

time_test(100, 3)

num = 12

parts_combo
[0.47471189498901367, 0.47522807121276855, 0.4798729419708252]

parts_combo_set
[0.3045799732208252, 0.30534601211547852, 0.35700607299804688]

parts_gosper
[0.83456206321716309, 0.83824801445007324, 0.87273812294006348]

sub_lists
[0.36697721481323242, 0.36919784545898438, 0.38349604606628418]

amit
[0.40012097358703613, 0.40033888816833496, 0.40788102149963379]

time_test(50, 3)

num = 14

parts_combo
[0.97016000747680664, 0.97931098937988281, 1.2653739452362061]

parts_combo_set
[0.81669902801513672, 0.88839983940124512, 0.91469597816467285]

parts_gosper
[1.772817850112915, 1.9343690872192383, 1.9586498737335205]

sub_lists
[0.78162002563476562, 0.79451298713684082, 0.8126368522644043]

amit
[0.89046502113342285, 0.89572596549987793, 0.91031289100646973]

time_test(50, 3)

num = 16

parts_combo
[4.1981601715087891, 4.3565289974212646, 4.3795731067657471]

parts_combo_set
[2.5452880859375, 2.5757780075073242, 2.6059379577636719]

parts_gosper
[7.5856668949127197, 7.6066100597381592, 7.6397140026092529]

sub_lists
[3.677016019821167, 3.6800520420074463, 3.7420001029968262]

amit
[4.1738030910491943, 4.1768841743469238, 4.1960680484771729]

time_test(10, 3)

num = 18

parts_combo
[3.8362669944763184, 3.8807728290557861, 4.0259079933166504]

parts_combo_set
[1.9355819225311279, 1.9540839195251465, 1.9573280811309814]

parts_gosper
[6.3178229331970215, 6.6125278472900391, 7.0462160110473633]

sub_lists
[2.1632919311523438, 2.238231897354126, 2.2747220993041992]

amit
[3.6137850284576416, 3.6162960529327393, 3.6475629806518555]

time_test(10, 3)

num = 20

parts_combo
[16.874133110046387, 17.585763931274414, 19.725590944290161]

parts_combo_set
[7.5462148189544678, 7.5597691535949707, 7.8375740051269531]

parts_gosper
[27.312526941299438, 27.637516021728516, 28.016690015792847]

sub_lists
[7.7865769863128662, 7.8874318599700928, 8.5498230457305908]

amit
[15.554526805877686, 15.626868009567261, 16.224159002304077]

These tests were performed on a 2GHz single-core machine with 2 GB of RAM running Python 2.6.6.



来源:https://stackoverflow.com/questions/36025609/n-choose-n-2-sublists-of-a-list

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!