Nucleotides separator in the pairwise sequence alignment bio python

情到浓时终转凉″ 提交于 2019-12-12 03:24:16

问题


I have RNA sequences that contain different modified nucleotides and residues. Some of them for example N79, 8XU, SDG, I.

I want to pairwise align them using biopython's pairwise2.align.localms. Is it possible to make input not as a string but as list for example in order to accurately account for these modified bases?

What is the correct technique?


回答1:


Biopython's pairwise2 module works on strings of letters, which can be anything - for example:

>>> from Bio import pairwise2
>>> from Bio.pairwise2 import format_alignment
>>> for a in pairwise2.align.localms("ACCGTN97CT", "ACCG8DXCT", 2, -1, -.5, -.1):
...     print(format_alignment(*a))
... 
ACCG--TN97CT
||||||||||||
ACCG8DX---CT
  Score=9.7

ACCGTN97--CT
||||||||||||
ACCG---8DXCT
  Score=9.7

You can set the match/mismatch scores according to your needs. However, this assumes each letter is a separate element.

It was not clear in your question if your example N79 was one modified nucleotide, or three? If you wanted to treat N79 as one base it does seem to be possible: I don't think it was intentional (so I wouldn't want to depend on this behaviour), but I could trick pairwise2 into working on lists of strings:

>>> for a in pairwise2.align.localms(["A", "C", "C", "G", "T", "N97", "C", "T"], ["A", "C", "C", "G", "8DX", "C", "T"], 2, -1, -.5, -.1, gap_char=["-"]):
...     print(format_alignment(*a))                                                                                                                  ... 
['A', 'C', 'C', 'G', 'T', 'N97', 'C', 'T']
||||||||
['A', 'C', 'C', 'G', '8DX', '-', 'C', 'T']
  Score=10.5

['A', 'C', 'C', 'G', 'T', 'N97', 'C', 'T']
||||||||
['A', 'C', 'C', 'G', '-', '8DX', 'C', 'T']
  Score=10.5

Notice the default format_alignment function does not display this very well.




回答2:


Sorry for adding another answer, but my reputation is not good enough for just adding comments...

To elaborate on peterjc's answer, accepting lists as input is the intended behaviour of pairwise2 (and now I understand what it may be good for...).

And you are right, it's also about the gap_char argument: Since your are applying the sequence as a list, the gap character must also be defined as a list (["-"]).



来源:https://stackoverflow.com/questions/36142371/nucleotides-separator-in-the-pairwise-sequence-alignment-bio-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!