Remove item from list based on the next item in same list

前端未结

关注

 11  2381

悲&欢浪女 2021-02-18 17:08

I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:

11条回答

一整个雨季 (楼主)

2021-02-18 17:44

A simple way is to process the input file one line at a time, compare each line with the previous one and keep previous one if it is not contained in current one.

Code can be as simple as:

with open('toy.txt' ,'r') as f:
    old = next(f).strip()               # keep first line after stripping EOL 

    for pattern in f:
        pattern = pattern.strip()       # strip end of line...
        if old not in pattern:
            print old                   # keep old if it is not contained in current line
        old = pattern                   # and store current line for next iteration
    print old                           # do not forget last line

0 讨论(0)

查看其它11个回答