Union of multiple sets in python

[[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]

I have a list of lists. My aim is to check whether any one sublist has anything in common with other sublists(excluding the first index object to compare). If it has anything in common then unify those sublists.

For example, for this example my final answer should be something like:

[[1, '34, '44', '40' '30', '41', '42', '43']]

I can understand that I should convert the sublists to sets and then use union() and intersection() operation. But what I am stuck with is to how to compare each set/sublist. I can't run a loop over the list and compare each sublist one by one as the contents of the list would be modified and this would lead to error.

What I want to know is there any efficient method to compare all the sublists(converted to sets) and get union of them?

The itertools module makes short work of this problem:

>>> from itertools import chain
>>> list(set(chain.from_iterable(d)))
[1, '41', '42', '43', '40', '34', '30', '44']

Another way to do it is to unpack the list into separate arguments for union():

>>> list(set().union(*d))
[1, '41', '42', '43', '40', '34', '30', '44']

The latter way eliminates all duplicates and doesn't require that the inputs first be converted to sets. Also, it doesn't require an import.

Using the unpacking operator *:

>> list(set.union(*map(set, a)))
[1, '44', '30', '42', '43', '40', '41', '34']

(Thanks Raymond Hettinger for the comment!)

(Note that

set.union(*tup)

will unpack to

set.union(tup[0], tup[1], ... tup[n - 1])

)

I personally like the readability of reduce, paired with a simple conditional function, something like

somelists = [[1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']] # your original lists
somesets = map(set,somelists) #your lists as sets

def condition(s1,s2): # condition to apply recursively to the sets
    if s1.intersection(s2):
        return s1.union(s2)
reduce( condition,somesets)
#{1, '30', '34', '40', '41', '42', '43', '44'}

Of course you can cast this result to a 2d list if you desire list([reduce(...

I will note that this is something like 3x slower than the chain.fromiterable answer.

You can use itertools to perform this action. Let us assume that your list has a variable name A

import itertools

single_list_with_all_values = list(itertools.chain(*A))
single_list_with_all_values.sort()

print set(single_list_with_all_values)

In [20]: s
Out[20]: 
[[1, '34', '44'],
 [1, '40', '30', '41'],
 [1, '41', '40', '42'],
 [1, '42', '41', '43'],
 [1, '43', '42', '44'],
 [1, '44', '34', '43']]
In [31]: list({x for _list in s for x in _list})
Out[31]: [1, '44', '30', '42', '43', '40', '41', '34']

Update:

Thanks for the comments

>>> big = [[1, '34', '44'], [1, '40', '30', '41'], [1, '41', '40', '42'], [1, '42', '41', '43'], [1, '43', '42', '44'], [1, '44', '34', '43']]
>>> set(reduce ( lambda l,a : l + a, big))
set([1, '44', '30', '42', '43', '40', '41', '34'])

And if you really want a list of a list as a final result

>>>>[list(set(reduce ( lambda l,a : l + a, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

And if you don't like recoding a lambda function for the list addition :

>>>>[list(set(reduce ( list.__add__, big)))]
[[1, '44', '30', '42', '43', '40', '41', '34']]

EDIT : after your recommendation about using itertools.chain instead of list.__add__ I ran a timeit for both with the original variable used by the original poster.

It seems that timeit times list.__add__ around 2.8s and itertools.chain around 3.5 seconds.

I checked on this page and yes, you were right with the itertools.chain contains a from_iterable method that grants a huge performance boost. see below with list.__add__, itertools.chain and itertools.chain.from_iterable.

>>> timeit.timeit("[list(set(reduce ( list.__add__, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
16.051744650801993
>>> timeit.timeit("[list(set(reduce ( itertools.chain, big)))]", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
54.721315866467194
>>> timeit.timeit("list(set(itertools.chain.from_iterable(big)))", setup="big = [ [10,20,30,40] for ele in range(10000)]", number=30)
0.040056066849501804

Thank you very much for your advises :)

来源：https://stackoverflow.com/questions/30773911/union-of-multiple-sets-in-python

标签

python

list

python-3.x

set-union