问题
I am new to Python...here is my problem. For an optimizing subroutine I am testing in Python, I need to parse a csv file with numbers.
The format of the csv file is thus:
Support load summary for anchor at node 5,
Load combination,FX (N),FY (N),FZ (N),MX (Nm),MY (Nm),MZ (Nm),,
Sustained,-3,-2679,120,2012,164,69,,
Operating1,1472,2710,-672,-4520,8743,-2047,,
Maximum,1472,2710,120,2012,8743,69,,
Minimum,-3,-2679,-672,-4520,164,-2047,,
Support load summary for anchor at node 40,
Load combination,FX (N),FY (N),FZ (N),MX (Nm),MY (Nm),MZ (Nm),,
Sustained,9,-3872,-196,-91,854,-3914,,
Operating1,-2027,-8027,3834,-7573,-9102,-6323,,
Maximum,9,-3872,3834,-91,854,-3914,,
Minimum,-2027,-8027,-196,-7573,-9102,-6323,,
Support load summary for anchor at node 125,
Load combination,FX (N),FY (N),FZ (N),MX (Nm),MY (Nm),MZ (Nm),,
Sustained,-7,-2448,76,264,83,1320,,
Operating1,556,-3771,-3162,-6948,-1367,1272,,
Maximum,556,-2448,76,264,83,1320,,
Minimum,-7,-3771,-3162,-6948,-1367,1272,,
Support load summary for Hanger at node 10,
Load combination,Load (N),,
Sustained,-3668,,
Operating1,-13876,,
Maximum,-3668,,
Minimum,-13876,,
Support load summary for Hanger at node 20B,
Load combination,Load (N),,
Sustained,-14305,,
Operating1,-13359,,
Maximum,-13359,,
Minimum,-14305,,
Support load summary for restraint at node 115B,
Load combination,FX (N),FY (N),FZ (N),,
Sustained,,-5655,,,
Operating1,3696,,
Maximum,,3696,,,
Minimum,,-5655,,,
My code works mainly on the lines starting with
Operating1,
Maximum,
Minimum,
The job (cost function) is to total (algebraically) all the numbers following one of these keywords. Sometimes as you can see in the data file above, there is only one number in the 2nd or 3rd col. (see end of data file), sometimes, there is no number at all like in the following file fragment (see line for Operating1 below).
Support load summary for Hanger at node 115B,
Load combination,Load (N),,
Sustained,-5188,,
Operating1,,,
Maximum,,,
Minimum,-5188,,
I am using np.genfromtxt(). Works great except when I run into lines that have fewer than 4 values in columns or sometimes none at all.
I am using sum() on genfromtxt() - see code. When there is only one value, I used a float(). When there is none, I tried to identify and assign zero to the total. I can customize for each case but am wondering if there is a general, more abstract method of reading and totaling the numbers in unpredictable cases.
Plus, I tried the "missing_values" and "filling_values" but they do not seem to work. How do I count the # of non-zero columns in a file?
Here is part of the code so far:
def optimize(fn, optflag):
modeltotals = []
i=0
csv1 = []
j = 1 # line # count
for line in csv.reader(filelist) :
temp = repr(line)
if "Support load summary" in temp :
csv1.append(line) # just making another list of actionable lines for future use
if (d): print "\n", line
continue
if (optflag == "ope") : # optimize on Operating loads
if "Operating1" in temp:
csv1.append(line)
if (len(line) > 4):
modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
if (d): print "Sum of OPE Loads:", modeltotals[i], "\n"
elif (len(line) > 0 and len(line) <= 4):
if (d): print "line=", line, "length", len(line)
line1 = np.genfromtxt(line[1:], delimiter=",")
if not line1: # meaning if array is empty
modeltotals.append(0)
else:
modeltotals.append(np.genfromtxt(line[1:], delimiter=",", missing_values=[0,0,0,0]))
if (d): print "OPE Max:", modeltotals[i],"\n"
i +=1
elif (optflag == "minmax") : #optimize on all loads, min and max.
#print "i=", i
if "Maximum" in temp:
csv1.append(line)
if (len(line) > 4):
modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
if (d): print "Sum of Maxs:", modeltotals[i]
elif (len(line) <= 4):
#line1 = np.genfromtxt(line[1:], delimiter=",", filling_values = 0)
#modeltotals.append(sum(line1))
if (d): print "line=", line, "length", len(line)
line1 = np.genfromtxt(line[1:], delimiter=",")
print "line1 =", line1
if not line1: # meaning if array is empty
modeltotals.append(0)
else:
modeltotals.append(np.genfromtxt(line[1:], delimiter=",", filling_values = 0))
if (d): print "Max:", modeltotals[i]
i+=1
elif "Minimum" in temp:
csv1.append(line)
if (len(line) > 4):
#print "#", j, "line", line
modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
if (d): print "Sum of Mins:", modeltotals[i]
elif (len(line) > 0 and len(line) <= 4):
if (d): print "line=", line, "length", len(line)
line1 = np.genfromtxt(line[1:], delimiter=",")
if not line1: # meaning if array is empty
modeltotals.append(0)
else:
modeltotals.append(np.genfromtxt(line[1:], delimiter=","))
if (d): print "Min:", modeltotals[i]
i +=1
j+=1
if len(modeltotals) > 0:
print modeltotals
average = float(sum(modeltotals))/len(modeltotals) #sometimes error here
else:
return "000" # error, seems like no file was analyzed
if (d):
print "Current model mean =", average
del csv1[:]
return abs(average)
The several errors I run into in different files are similar:
['Support load summary for restraint at node 20B', '']
Traceback (most recent call last):
File "sor4.py", line 190, in <module>
modelmean[filename] = optimize(filename, args.optimizeon)
File "sor4.py", line 107, in optimize
modeltotals.append(sum(np.genfromtxt(line[1:], delimiter=",")))
TypeError: iteration over a 0-d array
The other error is a "Cannot convert to a scalar."
I understand the errors but know not much Python to cleverly deal with them. Sorry for the long post; I will get better to present information more succinctly. As another poster here said, I will gratefully accept your answers. Thank you.
回答1:
I reduced your problem to the following code. It checks for nans and empty input strings.
from StringIO import StringIO
import numpy as np
def getnumbers(s):
try:
res = np.genfromtxt(s, delimiter=",")
return res[np.where(np.isnan(res), False, True)]
except IOError as ioe:
return np.array(0.)
print(sum(getnumbers(StringIO('1., 2., , '))))
print(sum(getnumbers(StringIO(''))))
It gives the result
3.0
0.0
来源:https://stackoverflow.com/questions/13595945/python-genfromtxt-problems