Remove item from list based on the next item in same list

前端 未结 11 2401
悲&欢浪女
悲&欢浪女 2021-02-18 17:08

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:

11条回答
  •  忘掉有多难
    2021-02-18 17:45

    This will get you where you want to be:

    with open('toy.txt' ,'r') as f:
        lines = f.readlines()
        data = set(lines)
        print(sorted([i for i in lines if len([j for j in data if j.startswith(i)])==1]))
    
    #['ABCDEFGHIJKLMNO', 'CEST', 'DBTSFDEO', 'EAEUDNBNUW', 'EOEUDNBNUW', 'FGH']
    

    I've added set just in case of multiple occurrences of same text.

提交回复
热议问题