how to calculate the entropy of a dna sequence in a fasta file

孤人 提交于 2019-12-25 07:15:22

问题


I need to calculate the entropy of a dna sequence in a fasta file, from the base 10000 to the base 11000 here is what I know, but I need to calculate the entropy of the sequence between the 10,000th to 11,000th base

from math import log  

def logent(x):  
    if x<=0:     
        return 0  
    else:  
        return -x*log(x)  

def entropy(lis):   
    return sum([logent(elem) for elem in lis])

for i in SeqIO.parse("hsvs.fasta", "fasta"):
    lisfreq1=[i.seq.count(base)*1.0/len(i.seq) for base in ["A", "C","G","T"]]

entropy(lisfreq1)

回答1:


Your sequence is just a string, you can therefore simply slice it, e.g.

seq_start = 10000
seq_end = 11000 + 1
for i in SeqIO.parse("hsvs.fasta", "fasta"):
    sub_seq = i.seq[seq_start:seq_end]
    lisfreq1=[sub_seq.count(base)*1.0/len(sub_seq) for base in ["A", "C","G","T"]]


来源:https://stackoverflow.com/questions/37909873/how-to-calculate-the-entropy-of-a-dna-sequence-in-a-fasta-file

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!