Can Biopython perform Seq.find() accounting for ambiguity codes

情到浓时终转凉″ 提交于 2019-12-10 13:58:11

问题


I want to be able to search a Seq object for a subsequnce Seq object accounting for ambiguity codes. For example, the following should be true:

from Bio.Seq import Seq
from Bio.Alphabet.IUPAC import IUPACAmbiguousDNA

amb = IUPACAmbiguousDNA()
s1 = Seq("GGAAAAGG", amb)
s2 = Seq("ARAA", amb)     # R = A or G
print s1.find(s2)

If ambiguity codes were taken into account, the answer should be

>>> 2

But the answer i get is that no match is found, or

>>> -1

Looking at the biopython source code, it doesnt appear that ambiguity codes are taken into account, as the subseqeunce is converted to a string using the private _get_seq_str_and_check_alphabet method, then the built in string method find() is used. Of course if this is the case, the "R" ambiguity code will be taken as a literal "R", not an A or G.

I could figure out how to do this with a home made method, but it seems like something that should be taken care of in the biopython packages using its Seq objects. Is there something I am missing here.

Is there a way to search for sub sequence membership accounting for ambiguity codes?


回答1:


From what I can read from the documentation for Seq.find here:

http://biopython.org/DIST/docs/api/Bio.Seq.Seq-class.html#find

It appears that this method works similar to the str.find method in that it looks for exact match. So, while the dna sequence can contain ambiguity codes, the Seq.find() method will only return a match when the exact subsequence matches.

To do what you want maybe the ntsearch function will work:

Search for motifs with degenerate positions



来源:https://stackoverflow.com/questions/32192933/can-biopython-perform-seq-find-accounting-for-ambiguity-codes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!