Python - Remove any element from a list of strings that is a substring of another element

前端 未结 8 2085
清酒与你
清酒与你 2020-12-06 10:09

So starting with a list of strings, as below

string_list = [\'rest\', \'resting\', \'look\', \'looked\', \'it\', \'spit\']

I wan

相关标签:
8条回答
  • 2020-12-06 10:50

    Here's a possible solution:

    string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
    def string_set(string_list):
        return set(i for i in string_list 
                   if not any(i in s for s in string_list if i != s))
    
    print(string_set(string_list))
    

    prints out:

    set(['looked', 'resting', 'spit'])
    

    Note I create a set (using a generator expression) to remove possibly duplicated words as it appears that order does not matter.

    0 讨论(0)
  • 2020-12-06 10:53

    First building block: substring.

    You can use in to check:

    >>> 'rest' in 'resting'
    True
    >>> 'sing' in 'resting'
    False
    

    Next, we're going to choose the naive method of creating a new list. We'll add items one by one into the new list, checking if they are a substring or not.

    def substringSieve(string_list):
        out = []
        for s in string_list:
            if not any([s in r for r in string_list if s != r]):
                out.append(s)
        return out
    

    You can speed it up by sorting to reduce the number of comparisons (after all, a longer string can never be a substring of a shorter/equal length string):

    def substringSieve(string_list):
        string_list.sort(key=lambda s: len(s), reverse=True)
        out = []
        for s in string_list:
            if not any([s in o for o in out]):
                out.append(s)
        return out
    
    0 讨论(0)
  • 2020-12-06 10:54

    Here's is the efficient way of doing it (relative to the above solutions ;) ) as this approach reduces the number of comparisons between the list elements a lot. If I have a huge list, I'd definitely go with this and of course you can morph this solution into a lambda function to make it look small:

    string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
    for item in string_list: 
      for item1 in string_list:
        if item in item1 and item!= item1:
          string_list.remove(item)
    
    print string_list
    

    Output:

    >>>['resting', 'looked', 'spit']
    

    Hope it helps !

    0 讨论(0)
  • 2020-12-06 10:55

    Here's another way to do it. Assuming you have a sorted list to start with and you don't have to do the sieving inplace, we can just choose the longest strings in one pass:

    string_list = sorted(string_list)
    sieved = []
    for i in range(len(string_list) - 1):
        if string_list[i] not in string_list[i+1]:
            sieved.append(string_list[i])
    
    0 讨论(0)
  • 2020-12-06 10:59

    Here's one method:

    def find_unique(original):
        output = []
    
        for a in original:
            for b in original:
                if a == b:
                    continue     # So we don't compare a string against itself
                elif a in b:
                    break
            else:
                output.append(a) # Executed only if "break" is never hit
    
        return output
    
    if __name__ == '__main__':
        original = ['rest', 'resting', 'look', 'looked', 'it', 'split']
        print find_unique(original)
    

    It exploits the fact that we can easily check if one string is a substring of another by using the in operator. It essentially goes through each string, checks to see if it's a substring of another, and appends itself to an output list if it isn't.

    This prints out ['resting', 'looked', 'split']

    0 讨论(0)
  • 2020-12-06 11:02

    Here is a one-liner that does what you want:

    filter(lambda x: [x for i in string_list if x in i and x != i] == [], string_list)
    

    Example:

    >>> string_list = ['rest', 'resting', 'look', 'looked', 'it', 'spit']
    >>> filter(lambda x: [x for i in string_list if x in i and x != i] == [], string_list)
    ['resting', 'looked', 'spit']
    
    0 讨论(0)
提交回复
热议问题