Remove item from list based on the next item in same list

前端 未结 11 2343
悲&欢浪女
悲&欢浪女 2021-02-18 17:08

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:

11条回答
  •  一整个雨季
    2021-02-18 17:44

    A simple way is to process the input file one line at a time, compare each line with the previous one and keep previous one if it is not contained in current one.

    Code can be as simple as:

    with open('toy.txt' ,'r') as f:
        old = next(f).strip()               # keep first line after stripping EOL 
    
        for pattern in f:
            pattern = pattern.strip()       # strip end of line...
            if old not in pattern:
                print old                   # keep old if it is not contained in current line
            old = pattern                   # and store current line for next iteration
        print old                           # do not forget last line
    

提交回复
热议问题