问题
What I would like the program to do is to take sequences related to a certain barcode and perform the defined function (average length and standard deviation of sequences, minus the barcode and non-relevant txt, identified by the same barcode). I have written something similar and based it off the similar program but I keep getting an indexerror. The idea is that all the sequences with the first barcode will be processed as barcodeCounter = 0 and the second one as barcodeCounter = 1, etc. Hopefully that is enough info, sorry if it is messy.
Input:
import sys
import math
def avsterr(x):
ave = sum(x)/len(x)
ssq = 0.0
for y in x:
ssq += (y-ave)*(y-ave)
var = ssq / (len(x)-1)
sdev = math.sqrt(var)
stderr = sdev / math.sqrt(len(x))
return (ave,stderr)
barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
barcodeCounter = 0
for barcode in b:
barcodeCounter = barcodeCounter + 1
barcode = barcode.strip()
print "barcode: %s" % barcode
handle = open(sequence, "r")
for line in handle:
print line
seq = line.split(' ',1)[-1].strip()
print "seq: %s" % seq
potential_barcode = seq[0:len(barcode)]
print "something"
if potential_barcode == barcode:
print "Checking sequences"
outseq = seq.replace(potential_barcode, "", 1)
outseq_length = [len(outseq)]
# toprocess.append("")
# toprocess[barcodeCounter] += outseq.strip
toprocess[barcodeCounter].extend(outseq.strip) #IndexError/line40
# toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq.strip
print "outseq: %s" % outseq
print "Barcodes to be processed: %s" % toprocess[barcodeCounter]
print "BC: %i" % barcodeCounter
handle.close()
b.close()
one = len(toprocess[0])
#two = lengths[2]
#three = lengths[3]
print one
#(av,st) = avsterr(lengths)
#print "%f +/- %f" % (av,st)
Output:
barcode: ATTAG
S01 ATTAGAAAAAAA
seq: ATTAGAAAAAAA
something
Checking sequences
Traceback (most recent call last):
File "./FinalProject.py", line 40, in <module>
toprocess[barcodeCounter].extend(outseq.strip)
IndexError: list index out of range
This is the code I'm basing it on.
sequenceCounter = -1
for line in handle:
if line[0] == ">":
sequenceCounter = sequenceCounter + 1
# print "seqid %s\n" % line
seqidList.append(line)
seqList.append("")
if line[0] != ">":
seqList[sequenceCounter] = seqList[sequenceCounter] + line.strip()
EDIT: Added the enumerate function and commented out barcodeCounter stuff.
barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
#barcodeCounter = -1
for barcodeCounter, barcode in enumerate(b):
# barcodeCounter = barcodeCounter + 1
barcode = barcode.strip()
print "barcode: %s" % barcode
handle = open(sequence, "r")
for line in handle:
print line
seq = line.split(' ',1)[-1].strip()
print "seq: %s" % seq
potential_barcode = seq[0:len(barcode)]
print "something"
if potential_barcode == barcode:
print "Checking sequences"
outseq = seq.replace(potential_barcode, "", 1)
outseq_length = [len(outseq)]
toprocess.append("")
# toprocess[barcodeCounter] += outseq.strip
toprocess[barcodeCounter].append(outseq.strip) #AttributeError line 40
# toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq.strip
print "outseq: %s" % outseq
print "Barcodes to be processed: %s" % toprocess[barcodeCounter]
print "BC: %i" % barcodeCounter
handle.close()
b.close()
New error:
barcode: ATTAG
S01 ATTAGAAAAAAA
seq: ATTAGAAAAAAA
something
Checking sequences
Traceback (most recent call last):
File "./FinalProject.py", line 40, in <module>
toprocess[barcodeCounter].append(outseq.strip)
AttributeError: 'str' object has no attribute 'append'
Code without the issue:
barcode = sys.argv[1]
sequence = sys.argv[2]
lengths = []
toprocess = []
b = open(barcode,"r")
#barcodeCounter = -1
for barcodeCounter, barcode in enumerate(b):
# barcodeCounter = barcodeCounter + 1
barcode = barcode.strip()
print "barcode: \n%s\n" % barcode
handle = open(sequence, "r")
for line in handle:
print line
seq = line.split(' ',1)[-1].strip()
print "seq: %s" % seq
potential_barcode = seq[0:len(barcode)]
# print "something"
if potential_barcode == barcode:
print "Checking sequences"
outseq = seq.replace(potential_barcode, "", 1)
outseq_length = [len(outseq)]
toprocess.append("")
toprocess[barcodeCounter] = toprocess[barcodeCounter] + outseq
@abarnert You were helpful, thank you. I'm not the brightest when it comes to programming sometimes(most the time). I had to also change the way I added the new sequences because they are str
not list
.
回答1:
You actually have two problems here.
First, you're counting from 1 instead of 0. You start barcodeCounter
at 0
, then you increment it before using it. This means that if you have, say, 3 barcodes, you're trying to set toprocess[1]
, then toprocess[2]
, then toprocess[3]
, and the last one is going to be an IndexError
.
Notice that the code you based it on starts with sequenceCounter = -1
rather than 0
to avoid this problem.
However, there's an even simpler solution to the problem: use enumerate
to do the counting for you:
for barcodeCounter, barcode in enumerate(b):
No need to remember whether to start at -1, 0, or 1, or where to do the incrementing, or any of that; it just automatically gets the numbers 0, 1, 2, etc. up to len(b)-1
.
Second, even if you counted correctly, toprocess
is not the same size as b
. In fact, it's completely empty, so toprocess[anything]
is always going to raise an exception.
To append a new value to the end of a list
, you call the append
method:
toprocess.append(…)
Again, notice that the code you're basing it on always does a seqList.append("")
before doing a seqList[sequenceCounter] =
. (Notice that it's a bit tricky—sometimes it append
s and increments sequenceCounter
, sometimes it does neither, and assigns to seqList[sequenceCounter]
using the previous value of sequenceCounter
.) You have to do the equivalent.
回答2:
The code
listVariable[indexNumber]
is used specifically to access something already existing in the list variable. The number you give it tells Python what part of the list you're looking for. Worth noting, the list starts counting from 0 and not 1. So the following code:
list = ["a","b","c","d"]
print list[0]
print list[3]
print list[1]
print list[-1]
will result in printing
a #index 0
d #index 3
b #index 1
d #index -1
(a minus index actually counts from the end, so -1 gives you d, and -2 would result in c)
An indexError is what happens when you give a number that the list has nothing stored for. If I tried to call list[4] I'd get an index error since it doesn't exist, just like if I tried to call a variable that doesn't exist.
Unlike with dictionaries, you can't set a list value by providing a non existing index. You need to use a method like append, or extend but not the way you did it where you're giving an index and then calling the extend function. Strictly speaking
list[3].append("e")
is telling Python to take the value stored in list[3] and append an 'e' to that, not to the overall list itself.
list.append("e")
That's what would actually add e to my list.
来源:https://stackoverflow.com/questions/29889328/indexerror-list-index-out-of-range-not-sure-why