I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:
Kenny, You almost got it, but there are two problems which @scharette pointed out:
for
loop and removing of list item should not go together. The fix is to use the while
loop and explicitly increase the index. The while
loop is less efficient because it calls len()
several times instead once, but that's what it take to get the correct result.IndexError
. This only happens at the very last line. My way to deal with this problem is to ignore the error.With that, I modified your code to:
with open('toy.txt' ,'r') as f:
pattern = f.read().splitlines()
print pattern
try:
i = 0
while i < len(pattern):
if pattern[i] in pattern[i+1]:
pattern.remove(pattern[i])
print pattern
i += 1
except IndexError:
pass