问题
First half of my question: When I try to run my program it loads and loads forever; it never shows the results. Could someone check out my code and spot an error somewhere. This program is meant to find a start DNA codon ATG and keep looking until finding a stop codon TAA or TAG or TGA, and then print out the gene from start to stop. I'm using BlueJ.
Second half of my question: I'm supposed to write a program in which the following steps are needed to be taken:
To find the first gene, find the start codon ATG.
Next look immediately past ATG for the first occurrence of each of the three stop codons TAG, TGA, and TAA.
If the length of the substring between ATG and any of these three stop codons is a multiple of three, then a candidate for a gene is the start codon through the end of the stop codon.
If there is more than one valid candidate, the smallest such string is the gene. The gene includes the start and stop codon.
If no start codon was found, then you are done.
If a start codon was found, but no gene was found, then start searching for another gene via the next occurrence of a start codon starting immediately after the start codon that didn't yield a gene.
If a gene was found, then start searching for the next gene immediately after this found gene.
Note that according to this algorithm, for the string "ATGCTGACCTGATAG", ATGCTGACCTGATAG could be a gene, but ATGCTGACCTGA would not be, even though it is shorter, because another instance of 'TGA' is found first that is not a multiple of three away from the start codon.
In my assignment I'm asked to produce these methods as well:
Specifically, to implement the algorithm, you should do the following.
Write the method findStopIndex that has two parameters dna and index, where dna is a String of DNA and index is a position in the string. This method finds the first occurrence of each stop codon to the right of index. From those stop codons that are a multiple of three from index, it returns the smallest index position. It should return -1 if no stop codon was found and there is no such position. This method was discussed in one of the videos.
Write the void method printAll that has one parameter dna, a String of DNA. This method should print all the genes it finds in DNA. This method should repeatedly look for a gene, and if it finds one, print it and then look for another gene. This method should call findStopIndex. This method was also discussed in one of the videos.
Write the void method testFinder that will use the two small DNA example strings shown below. For each string, it should print the string, and then print the genes found in the string. Here is sample output that includes the two DNA strings:
Sample output is:
ATGAAATGAAAA
Gene found is:
ATGAAATGA
DNA string is:
ccatgccctaataaatgtctgtaatgtaga
Genes found are:
atgccctaa
atgtctgtaatgtag
DNA string is:
CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA
Genes found are:
ATGTAA
ATGAATGACTGATAG
ATGCTATGA
ATGTGA
I've thought it through and found this bit of code to be close to working order. I just need for my output to produce the results asked for in the instructions. Hopefully this isn't too messy, I'm just at a loss as to how to look for a stop codon after the start codon and then how I can grab the gene sequence. I'm also hoping to understand how to get the closest sequence of genes by finding which of the three tags (tag, tga, taa) is closer to atg. I know this is alot but hopefully it all makes sense.
import edu.duke.*;
import java.io.*;
public class FindMultiGenes {
public String findGenes(String dnaOri) {
String gene = new String();
String dna = dnaOri.toLowerCase();
int start = -1;
while(true){
start = dna.indexOf("atg", start);
if (start == -1) {
break;
}
int stop = findStopCodon(dna, start);
if(stop > start){
String currGene = dnaOri.substring(start, stop+3);
System.out.println("From: " + start + " to " + stop + "Gene: "
+currGene);}
}
return gene;
}
private int findStopCodon(String dna, int start){
for(int i = start + 3; i<dna.length()-3; i += 3){
String currFrameString = dna.substring(i, i+3);
if(currFrameString.equals("TAG")){
return i;
} else if( currFrameString.equals("TGA")){
return i;
} else if( currFrameString.equals("TAA")){
return i;
}
}
return -1;
}
public void testing(){
FindMultiGenes FMG = new FindMultiGenes();
String dna =
"CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
FMG.findGenes(dna);
System.out.println("DNA string is: " + dna);
}
}
回答1:
Change your line start = dna.indexOf("atg", start);
to
start = dna.indexOf("atg", start + 1);
What is currently happening is you find the "atg"
at index k
and in the next run search the string for the next "atg"
from k
onwards. That finds the next match at the exact same location since the start location is inclusive. Therefore you are going to find the same index k
over and over again and will never halt.
By increasing the index by 1 you jump over the currently found index k
and start searching for next match from k+1
onwards.
回答2:
This program is meant to find a start DNA codon ATG and keep looking until finding a stop codon TAA or TAG or TGA, and then print out the gene from start to stop.
Since the first search always starts from 0 you can just set the start index there, then search the stop codon from the result. Here I do it with 1 of the stop codons:
public static void main(String[] args) {
String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
String sequence = dna.toLowerCase();
int index = 0;
int newIndex = 0;
while (true) {
index = sequence.indexOf("atg", index);
if (index == -1)
return;
newIndex = sequence.indexOf("tag", index + 3);
if (newIndex == -1) // Check needed only if a stop codon is not guaranteed for each start codon.
return;
System.out.println("From " + (index + 3) + " to " + newIndex + " Gene: " + sequence.substring(index + 3, newIndex));
index = newIndex + 3;
}
}
Output:
From 4 to 7 Gene: taa
From 13 to 22 Gene: aatgactga
Also, you can use a regex to do a lot of the work for you:
public static void main(String[] args) {
String dna = "CATGTAATAGATGAATGACTGATAGATATGCTTGTATGCTATGAAAATGTGAAATGACCCA";
Pattern p = Pattern.compile("ATG([ATGC]+?)TAG");
Matcher m = p.matcher(dna);
while (m.find())
System.out.println("From " + m.start(1) + " to " + m.end(1) + " Gene: " + m.group(1));
}
Output:
From 4 to 7 Gene: TAA
From 13 to 22 Gene: AATGACTGA
来源:https://stackoverflow.com/questions/34459060/java-program-malfunction