Union of multiple sets in python

五迷三道 提交于 2019-12-03 23:21:30

The itertools module makes short work of this problem:

>>> from itertools import chain
>>> list(set(chain.from_iterable(d)))
[1, '41', '42', '43', '40', '34', '30', '44']

Another way to do it is to unpack the list into separate arguments for union():

>>> list(set().union(*d))
[1, '41', '42', '43', '40', '34', '30', '44']

The latter way eliminates all duplicates and doesn't require that the inputs first be converted to sets. Also, it doesn't require an import.

Using the unpacking operator *:

>> list(set.union(*map(set, a)))
[1, '44', '30', '42', '43', '40', '41', '34']

(Thanks Raymond Hettinger for the comment!)

(Note that

set.union(*tup)

will unpack to

set.union(tup[0], tup[1], ... tup[n - 1])

)

I personally like the readability of reduce, paired with a simple conditional function, something like

somelists = [[1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']] # your original lists
somesets = map(set,somelists) #your lists as sets

def condition(s1,s2): # condition to apply recursively to the sets
    if s1.intersection(s2):
        return s1.union(s2)
reduce( condition,somesets)
#{1, '30', '34', '40', '41', '42', '43', '44'}

Of course you can cast this result to a 2d list if you desire list([reduce(...

I will note that this is something like 3x slower than the chain.fromiterable answer.

You can use itertools to perform this action. Let us assume that your list has a variable name A

import itertools

single_list_with_all_values = list(itertools.chain(*A))
single_list_with_all_values.sort()

print set(single_list_with_all_values)
In [20]: s
Out[20]: 
[[1, '34', '44'],
 [1, '40', '30', '41'],
 [1, '41', '40', '42'],
 [1, '42', '41', '43'],
 [1, '43', '42', '44'],
 [1, '44', '34', '43']]
In [31]: list({x for _list in s for x in _list})
Out[31]: [1, '44', '30', '42', '43', '40', '41', '34']

Update:

Thanks for the comments

>>> big = [[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
>>> set(reduce ( lambda l,a : l + a, big))
set([1, '44', '30', '42', '43', '40', '41', '34'])

And if you really want a list of a list as a final result

>>>>[list(set(reduce ( lambda l,a : l + a, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

And if you don't like recoding a lambda function for the list addition :

>>>>[list(set(reduce ( list.__add__, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

EDIT : after your recommendation about using itertools.chain instead of list.__add__ I ran a timeit for both with the original variable used by the original poster.

It seems that timeit times list.__add__ around 2.8s and itertools.chain around 3.5 seconds.

I checked on this page and yes, you were right with the itertools.chain contains a from_iterable method that grants a huge performance boost. see below with list.__add__, itertools.chain and itertools.chain.from_iterable.

>>> timeit.timeit("[list(set(reduce ( list.__add__, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
16.051744650801993
>>> timeit.timeit("[list(set(reduce ( itertools.chain, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
54.721315866467194
>>> timeit.timeit("list(set(itertools.chain.from_iterable(big)))", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
0.040056066849501804

Thank you very much for your advises :)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!