Python Count the number of substring in list from other string list without duplicates

后端 未结 6 1123
小鲜肉
小鲜肉 2021-01-13 01:03

I have two list:

main_list = [\'Smith\', \'Smith\', \'Roger\', \'Roger-Smith\', \'42\']
master_list = [\'Smith\', \'Roger\']

I want to coun

相关标签:
6条回答
  • 2021-01-13 01:38

    What about this

    main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
    master_list = ['Smith', 'Roger']
    
    print len([word for word in main_list if any(mw in word for mw in master_list)])
    
    0 讨论(0)
  • 2021-01-13 01:40

    This would do it:

    main_list = ['Smith', 'Smith', 'Roger', 'Roger-Smith', '42']
    master_list = ['Smith', 'Roger']
    
    i = 0
    for elem in main_list:
        if elem in master_list:
            i += 1
            continue
        for master_elem in master_list:
            if master_elem in elem:
                i += 1
                break
    
    print(i) # i = 4
    

    The code above counts 'Roger-Smith' as 1, if you want it to count as multiple, remove the break.

    0 讨论(0)
  • 2021-01-13 01:55

    You can do it other way around. Create list that will contain only elements from main_list that have substring from master_list

    temp_list = [ string for string in main_list if any(substring in string for substring in master_list)]
    

    Now temp_list looks like this:

    ['Smith', 'Smith', 'Roger', 'Roger-Smith']
    

    So the length of temp_list is your answer.

    0 讨论(0)
  • 2021-01-13 01:58

    A one liner

    >>>sum(any(m in L for m in master_list) for L in main_list)
    4
    

    Iterate over main_list and check if any of the values from master_list are in that string. This leaves you with a list of bool values. It will stop after it finds one and so adds only one to the count for each string. Conveniently sum counts all the Trues to give you the count.

    0 讨论(0)
  • 2021-01-13 01:58

    You can use pandas (which provide fast vectorized operations) with str.contains and sum()

    import pandas as pd
    main_list = pd.Series(['Smith', 'Smith', 'Roger', 'Roger-Smith', '42'])
    master_list = ['Smith', 'Roger']
    count = main_list.str.contains('|'.join(master_list)).sum()
    
    0 讨论(0)
  • 2021-01-13 01:59

    If your master_list is not expected to be huge, one way to do it is with regex:

    import re
    
    def string_detection(master_list, main_list):
        count = 0
        master = re.compile("|".join(master_list))
        for entry in main_list:
            if master.search(entry):
                count += 1
        return count
    
    0 讨论(0)
提交回复
热议问题