问题
I'm beginning with biopython and I have a question about parsing results. I used a tutorial to get involved in this and here is the code that I used:
from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("/Users/jcastrof/blast/pruebarpsb.xml")):
if record.alignments:
print "Query: %s..." % record.query[:60]
for align in record.alignments:
for hsp in align.hsps:
print " %s HSP,e=%f, from position %i to %i" \
% (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
Part of the result obtained is:
gnl|CDD|225858 HSP,e=0.000000, from position 32 to 1118
gnl|CDD|225858 HSP,e=0.000000, from position 1775 to 2671
gnl|CDD|214836 HSP,e=0.000000, from position 37 to 458
gnl|CDD|214836 HSP,e=0.000000, from position 1775 to 2192
gnl|CDD|214838 HSP,e=0.000000, from position 567 to 850
And what I want to do is to sort that result by position of the hit (Hsp_hit-from), like this:
gnl|CDD|225858 HSP,e=0.000000, from position 32 to 1118
gnl|CDD|214836 HSP,e=0.000000, from position 37 to 458
gnl|CDD|214838 HSP,e=0.000000, from position 567 to 850
gnl|CDD|225858 HSP,e=0.000000, from position 1775 to 2671
gnl|CDD|214836 HSP,e=0.000000, from position 1775 to 2192
My input file for rps-blast is a *.xml file. Any suggestion to proceed?
Thanks!
回答1:
The HSPs list is just a Python list, and can be sorted as usual. Try:
align.hsps.sort(key = lambda hsp: hsp.query_start)
However, you are dealing with a nested list (each match has a list of HSPs), and you want to sort over all of them. Here making your own list might be best - something like this:
for record in ...:
print "Query: %s..." % record.query[:60]
hits = sorted((hsp.query_start, hsp.query_end, hsp.expect, align.hit_id) \
for hsp in align.hsps for align in record.alignments)
for q_start, q_end, expect, hit_id in hits:
print " %s HSP,e=%f, from position %i to %i" \
% (hit_id, expect, q_start, q_end)
Peter
来源:https://stackoverflow.com/questions/16070195/sort-rps-blast-results-by-position-of-the-hit