I just started learning python and here I have a sorted list of protein sequences (total 59,000 sequences) and some of them overlap. I have made a toy list here for example:
This will get you where you want to be:
with open('toy.txt' ,'r') as f:
lines = f.readlines()
data = set(lines)
print(sorted([i for i in lines if len([j for j in data if j.startswith(i)])==1]))
#['ABCDEFGHIJKLMNO', 'CEST', 'DBTSFDEO', 'EAEUDNBNUW', 'EOEUDNBNUW', 'FGH']
I've added set
just in case of multiple occurrences of same text.