问题
I'm trying to separate this csv file into a 2D list. The problem with my code currently is that it cuts off a few fields on lines with quotes in the data. There are quotes there to signify that the comma within is not part of the comma separation of fields and is actually part of the field. I posted the code, example data, and example output. You can see how the first output line skips a few fields compared to the rest because of the quotes. What do I need to do with the regular expression line? Thanks for any help in advance.
Here's a cut of the code:
import sys
import re
import time
# get the date
date = time.strftime("%x")
# function for reading in each line of file
# returns array of each line
def readIn(file):
array = []
for line in file:
array.append(line)
return array
def main():
data = open(sys.argv[1], "r")
template = open(sys.argv[2], "r")
output = open(sys.argv[3], "w")
finalL = []
dataL = []
dataL = readIn(data)
templateL = []
templateL = readIn(template)
costY = 0
dateStr = ""
# split each line in the data by the comma unless there are quotes
for i in range(0, len(dataL)):
if '"' in dataL[i]:
Pattern = re.compile(r'''((?:[^,"']|"[^"]*"|'[^']*')+)''')
dataL[i] = Pattern.split(dataL[i])[1::2]
for j in range(0, len(dataL[i])):
dataL[i][j] = dataL[i][j].strip()
else:
temp = dataL[i].strip().split(",")
dataL[i] = temp
Data example:
OrgLevel3: ATHLET ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
ATHLET Direct,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
Grand Total for ATHLET:,,,"1,312 ",,62:58:18,130.62 ,,
,,,,,,,,
OrgLevel3: BOOK ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
BOOK Direct,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
Grand Total for BOOK:,,,434 ,,14:59:18,28.09 ,,
,,,,,,,,
OrgLevel3: CARD ,,,,,,,,
,,,,,,,,
Name,,,Calls,,Duration,Cost ($),,
,,,,,,,,
CARD Direct,,,253 ,,09:02:54,14.30 ,,
,,,,,,,,
Grand Total for CARD:,,,253 ,,09:02:54,14.30 ,,
Example output:
['Grand Total for ATHLET:', '"1,312 "', '62:58:18', '130.62', '']
['Grand Total for BOOK:', '', '', '434 ', '', '14:59:18', '28.09 ', '', '']
['Grand Total for CARD:', '', '', '253 ', '', '09:02:54', '14.30 ', '', '']
回答1:
If you're trying to load a CSV into a list then your entire code to do so is:
import csv
with open(sys.argv[1]) as data:
dataL = list(csv.reader(data))
If your example data is your input data, then it needs other work before hand..., eg:
dataL = [row for row in csv.reader(data) if row[0].startswith('Grand Total for')]
来源:https://stackoverflow.com/questions/20123551/separate-fields-by-comma-and-quotes-in-python