Remove item from list based on the next item in same list

前端 未结 11 2371
悲&欢浪女
悲&欢浪女 2021-02-18 17:08

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:

11条回答
  •  青春惊慌失措
    2021-02-18 17:54

    You could use groupby() and max() to help here:

    from itertools import groupby
    
    with open('toy.txt') as f_input:
        for key, group in groupby(f_input, lambda x: x[:2]):
            print(max(group, key=lambda x: len(x)).strip())
    

    This would display:

    ABCDEFGHIJKLMNO
    CEST
    DBTSFDEO
    EOEUDNBNUW
    EAEUDNBNUW
    FGH
    

    groupby() works by returning a list of matching items based on a function, in this case consecutive lines with the same first 2 characters. The max() function then takes this list and returns the list item with the longest length.

提交回复
热议问题