I have two list:
main_list = [\'Smith\', \'Smith\', \'Roger\', \'Roger-Smith\', \'42\']
master_list = [\'Smith\', \'Roger\']
I want to coun
What about this
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
print len([word for word in main_list if any(mw in word for mw in master_list)])
This would do it:
main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
master_list = ['Smith', 'Roger']
i = 0
for elem in main_list:
if elem in master_list:
i += 1
continue
for master_elem in master_list:
if master_elem in elem:
i += 1
break
print(i) # i = 4
The code above counts 'Roger-Smith'
as 1, if you want it to count as multiple, remove the break
.
You can do it other way around. Create list that will contain only elements from main_list
that have substring from master_list
temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]
Now temp_list
looks like this:
['Smith', 'Smith', 'Roger', 'Roger-Smith']
So the length of temp_list
is your answer.
A one liner
>>>sum(any(m in L for m in master_list) for L in main_list)
4
Iterate over main_list
and check if any
of the values from master_list
are in that string. This leaves you with a list of bool values. It will stop after it finds one and so adds only one to the count for each string. Conveniently sum
counts all the True
s to give you the count.
You can use pandas
(which provide fast vectorized operations) with str.contains
and sum()
import pandas as pd
main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
master_list = ['Smith', 'Roger']
count = main_list.str.contains('|'.join(master_list)).sum()
If your master_list is not expected to be huge, one way to do it is with regex:
import re
def string_detection(master_list, main_list):
count = 0
master = re.compile("|".join(master_list))
for entry in main_list:
if master.search(entry):
count += 1
return count