问题
I have a 15-mer nucleotide motif that uses degenerate nucleotide sequences. Example: ATNTTRTCNGGHGCN.
I would search a set of sequences for the occurrence of this motif. However, my other sequences are exact sequences, i.e. they have no ambiguity.
I have tried doing a for
loop within the sequences to search for this, but I have not been able to do non-exact searches. The code I use is modeled after the code on the Biopython cookbook.
for pos,seq in m.instances.search(test_seq):
print pos, seq
I would like to search for all possible exact instances of the non-exact 15-mer. Is there a function available, or would I have to resort to defining my own function for that? (I'm okay doing the latter, just wanted to triple-check with the world that I'm not duplicating someone else's efforts before I go ahead - I have already browsed through what I thought was the relevant parts of the docs.)
回答1:
Use Biopython's nt_search. It looks for a subsequence in a DNA sequence, expanding ambiguity codes to the possible nucleotides in that position. Example:
>>> from Bio import SeqUtils
>>> pat = "ATNTTRTCNGGHGCN"
>>> SeqUtils.nt_search("CCCCCCCATCTTGTCAGGCGCTCCCCCC", pat)
['AT[GATC]TT[AG]TC[GATC]GG[ACT]GC[GATC]', 7]
It returns a list where the first item is the search pattern, followed by the positions of the matches.
来源:https://stackoverflow.com/questions/18522093/search-for-motifs-with-degenerate-positions