Remove item from list based on the next item in same list

前端 未结 11 2339
悲&欢浪女
悲&欢浪女 2021-02-18 17:08

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:

11条回答
  •  别那么骄傲
    2021-02-18 17:48

    # assuming list is sorted:
    pattern = ["ABCDE",
    "ABCDEFG",
    "ABCDEFGH",
    "ABCDEFGHIJKLMNO",
    "CEST",
    "DBTSFDE",
    "DBTSFDEO",
    "EOEUDNBNUW",
    "EAEUDNBNUW",
    "FG",
    "FGH"]
    
    pattern = list(reversed(pattern))
    
    def iterate_patterns():
        while pattern:
            i = pattern.pop()
            throw_it_away = False
            for p in pattern:
                if p.startswith(i):
                    throw_it_away = True
                    break
            if throw_it_away == False:
                yield i
    
    print(list(iterate_patterns()))
    

    Output:

    ['ABCDEFGHIJKLMNO', 'CEST', 'DBTSFDEO', 'EOEUDNBNUW', 'EAEUDNBNUW', 'FGH']

提交回复
热议问题