Search for string allowing for one mismatch in any location of the string

后端 未结 13 887
闹比i
闹比i 2020-11-30 02:45

I am working with DNA sequences of length 25 (see examples below). I have a list of 230,000 and need to look for each sequence in the entire genome (toxoplasma gondii parasi

相关标签:
13条回答
  • 2020-11-30 03:23

    I thought the code below is simple and convenient.

    in_pattern = "";
    in_genome = "";
    in_mistake = d;
    out_result = ""
    
    
    kmer = len(in_pattern)
    
    def FindMistake(v):
        mistake = 0
        for i in range(0, kmer, 1):
            if (v[i]!=in_pattern[i]):
                mistake+=1
            if mistake>in_mistake:
                return False
        return True
    
    
    for i in xrange(len(in_genome)-kmer+1):
        v = in_genome[i:i+kmer]
        if FindMistake(v):
            out_result+= str(i) + " "
    
    print out_result
    

    You can easily insert the genomes and segments you want to check and also set up the value of mismatch.

    0 讨论(0)
提交回复
热议问题