问题
I am having problems with CS50 pset6 DNA. It is getting all the right values and gives correct answers when I use the small.csv
file but not when I use the large one. I have been going through it with debug50 for over a week and can't figure out the problem. I assume the problem is somewhere in the loop through the samples to find the STRS but I just don't see what it is doing wrong when walking through it.
If you are unfamiliar with CS50 DNA problemset, the code is supposed to look through a dna sequence (argv[1]
) and compare it with a CSV file containing people DNA STRs to figure out which person (if any) it belongs to.
Note; My code fails within the case; (Python dna.py databases/large.csv sequences/5.txt) if this helps.
from sys import argv
from csv import reader
#ensures correct number of arguments
if (len(argv) != 3):
print("usage: python dna.py data sample")
#dict for storage
peps = {}
#storage for strands we look for.
types = []
#opens csv table
with open(argv[1],'r') as file:
data = reader(file)
line = 0
number = 0
for l in data:
if line == 0:
for col in l:
if col[2].islower() and col != 'name':
break
if col == 'name':
continue
else:
types.append(col)
line += 1
else:
row_mark = 0
for col in l:
if row_mark == 0:
peps[col] = []
row_mark += 1
else:
peps[l[0]].append(col)
#convert sample to string
samples = ""
with open(argv[2], 'r') as sample:
for c in sample:
samples = samples + c
#DNA STR GROUPS
dna = { "AGATC" : 0,
"AATG" : 0,
"TATC" : 0,
"TTTTTTCT" : 0,
"TCTAG" : 0,
"GATA" : 0,
"GAAA" : 0,
"TCTG" : 0 }
#go through all the strs in dna
for keys in dna:
#the longest run of sequnace
longest = 0
#the current run of sequances
run = 0
size = len(keys)
#look through sample for longest
i = 0
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
if run > longest:
longest = run
run = 0
i += 1
dna[keys] = longest
#see who it is
positive = True
person = ''
for key in peps:
positive = True
for entry in types:
x = types.index(entry)
test = dna.get(entry)
can = int(peps.get(key)[x])
if (test != can):
positive = False
if positive == True:
person = key
break
if person != '':
print(person)
else:
print("No match")
回答1:
Problem is in this while loop. Look at this code carefully.
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
if run > longest:
longest = run
run = 0
i += 1
You have a missing logic here. You are supposed to check the longest consecutive DNA sequence. So when you have a repetition of dna sequence back to back, you need to find how many times it is repeated. When it is no longer repeated, only then, you need to check if this is the longest sequence.
Solution
You need to add else
statement after if hold==keys:
statement. This would be the right fix;
while i < len(samples):
hold = samples[i:(i + size)]
if hold == keys:
run += 1
#ensure the code does not go outside len of samples
if ((i + size) < len(samples)):
i = i + size
continue
else: #only if there is no longer sequence match, check this.
if run > longest:
longest = run
run = 0
else: #if the number of sequence match is still smaller then longest, then make run zero.
run = 0
i += 1
来源:https://stackoverflow.com/questions/61444287/cs50-dna-works-for-small-csv-but-not-for-large