I have two lists say
List1 = [\'a\',\'c\',\'c\']
List2 = [\'x\',\'b\',\'a\',\'x\',\'c\',\'y\',\'c\']
Now I want to find out if all element
I can't use the subset function because I can have repeated elements in lists.
What this means is that you want to treat your lists as multisets rather than sets. The usual way to handle multisets in Python is with collections.Counter:
A
Counter
is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. TheCounter
class is similar to bags or multisets in other languages.
And, while you can implement subset for multisets (implemented with Counter
) by looping and comparing counts, as in poke's answer, this is unnecessary—just as you can implement subset for sets (implemented with set
or frozenset
) by looping and testing in
, but it's unnecessary.
The Counter
type already implements all the set operators extended in the obvious way for multisets.<1 So you can just write subset in terms of those operators, and it will work for both set
and Counter
out of the box.
With (multi)set difference:2
def is_subset(c1, c2):
return not c1 - c2
Or with (multi)set intersection:
def is_subset(c1, c2):
def c1 & c2 == c1
1. You may be wondering why, if Counter
implements the set operators, it doesn't just implement <
and <=
for proper subset and subset. Although I can't find the email thread, I'm pretty sure this was discussed, and the answer was that "the set operators" are defined as the specific set of operators defined in the initial version of collections.abc.Set
(which has since been expanded, IIRC…), not all operators that set
happens to include for convenience, in the exact same way that Counter
doesn't have named methods like intersection
that's friendly to other types than &
just because set
does.
2. This depends on the fact that collections in Python are expected to be falsey when empty and truthy otherwise. This is documented here for the builtin types, and the fact that bool
tests fall back to len
is explained here—but it's ultimately just a convention, so that "quasi-collections" like numpy arrays can violate it if they have a good reason. It holds for "real collections" like Counter
, OrderedDict
, etc. If you're really worried about that, you can write len(c1 - c2) == 0
, but note that this is against the spirit of PEP 8.
Be aware of the following:
>>>listA = ['a', 'a', 'b','b','b','c']
>>>listB = ['b', 'a','a','b','c','d']
>>>all(item in listB for item in listA)
True
If you read the "all" line as you would in English, This is not wrong but can be misleading, as listA has a third 'b' but listB does not.
This also has the same issue:
def list1InList2(list1, list2):
for item in list1:
if item not in list2:
return False
return True
Just a note. The following does not work:
>>>tupA = (1,2,3,4,5,6,7,8,9)
>>>tupB = (1,2,3,4,5,6,6,7,8,9)
>>>set(tupA) < set(TupB)
False
If you convert the tuples to lists it still does not work. I don't know why strings work but ints do not.
Works but has same issue of not keeping count of element occurances:
>>>set(tupA).issubset(set(tupB))
True
Using sets is not a comprehensive solution for multi-occurrance element matching.
But here is a one-liner solution/adaption to shantanoo's answer without try/except:
all(True if sequenceA.count(item) <= sequenceB.count(item) else False for item in sequenceA)
A builtin function wrapping a list comprehension using a ternary conditional operator. Python is awesome! Note that the "<=" should not be "==".
With this solution sequence A and B can be type tuple and list and other "sequences" with "count" methods. The elements in both sequences can be most types. I would not use this with dicts as it is now, hence the use "sequence" instead of "iterable".
A solution using Counter and the builtin intersection method (note that -
is proper multiset difference, not an element-wise subtraction):
from collections import Counter
def is_subset(l1, l2):
c1, c2 = Counter(l1), Counter(l2)
return not c1 - c2
Test:
>>> List1 = ['a','c','c']
>>> List2 = ['x','b','a','x','c','y','c']
>>> is_subset(List1, List2)
True
def check_subset(list1, list2):
try:
[list2.remove(x) for x in list1]
return 'all elements in list1 are in list2'
except:
return 'some elements in list1 are not in list2'
When number of occurrences doesn't matter, you can still use the subset functionality, by creating a set on the fly:
>>> list1 = ['a', 'c', 'c']
>>> list2 = ['x', 'b', 'a', 'x', 'c', 'y', 'c']
>>> set(list1).issubset(list2)
True
If you need to check if each element shows up at least as many times in the second list as in the first list, you can make use of the Counter type and define your own subset relation:
>>> from collections import Counter
>>> def counterSubset(list1, list2):
c1, c2 = Counter(list1), Counter(list2)
for k, n in c1.items():
if n > c2[k]:
return False
return True
>>> counterSubset(list1, list2)
True
>>> counterSubset(list1 + ['a'], list2)
False
>>> counterSubset(list1 + ['z'], list2)
False
If you already have counters (which might be a useful alternative to store your data anyway), you can also just write this as a single line:
>>> all(n <= c2[k] for k, n in c1.items())
True
This will return true is all the items in List1 are in List2
def list1InList2(list1, list2):
for item in list1:
if item not in list2:
return False
return True