问题
I need to calculate the entropy of a dna sequence in a fasta file, from the base 10000 to the base 11000 here is what I know, but I need to calculate the entropy of the sequence between the 10,000th to 11,000th base
from math import log
def logent(x):
if x<=0:
return 0
else:
return -x*log(x)
def entropy(lis):
return sum([logent(elem) for elem in lis])
for i in SeqIO.parse("hsvs.fasta", "fasta"):
lisfreq1=[i.seq.count(base)*1.0/len(i.seq) for base in ["A", "C","G","T"]]
entropy(lisfreq1)
回答1:
Your sequence is just a string, you can therefore simply slice it, e.g.
seq_start = 10000
seq_end = 11000 + 1
for i in SeqIO.parse("hsvs.fasta", "fasta"):
sub_seq = i.seq[seq_start:seq_end]
lisfreq1=[sub_seq.count(base)*1.0/len(sub_seq) for base in ["A", "C","G","T"]]
来源:https://stackoverflow.com/questions/37909873/how-to-calculate-the-entropy-of-a-dna-sequence-in-a-fasta-file