I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. The csv files have common headers. The csv file looks like this:
This should do the job:
import glob
import os
outfile = open('output.csv', 'w')
outfile.write('X ; Y ; Z ; filename\n')
for filename in glob.glob('*.csv'):
if filename == 'output.csv': # Skip the file we're writing.
continue
with open(filename, 'r') as infile:
count = 0
lineno = 0
for line in infile:
lineno += 1
if lineno == 1: # Skip the header line.
continue
fields = line.split(';')
x = int(fields[0])
y = int(fields[1])
z = float(fields[2])
if x == 1 and y == 2:
outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
count += 1
if count == 0: # Handle the case when no lines were found.
outfile.write('1 ; 2 ; NA ; %s\n' % filename)
outfile.close()
Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float.
This is a well-formed question, from which the logic should be apparent. For someone to provide finished code would defeat the purpose of the assignment. First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.
To get started, try:
import csv, os
for fn in os.listdir():
if ".csv" in fn:
with open(fn, 'r', newline='') as f:
reader = csv.reader(f, delimiter=";")
for row in reader:
...
Then extend the solution to open the output file and write the selected lines using csv.writer.
The following should work:
import csv
with open('output.csv', 'w') as outfile:
outfile.write('X ; Y ; Z ; filename\n')
fmt = '1 ; 2 ; %s ; %s\n'
files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
for file in files:
with open(file) as f:
reader = csv.reader(f, delimiter=';')
for row in reader:
if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
outfile.write(fmt % (row[2].strip(), file[:-4]))
break
else:
outfile.write(fmt % ('NA', file[:-4]))
if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing
if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)
You could read in each file at a time. Read it line by line
files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
for f in files:
file = open(f, 'r')
for line in file:
ray = line.split(';')
if (ray[0].strip() == '1' and ray[1].strip() == '2'):
fout = open('output.csv', 'a')
fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
fout.close()
file.close()
Tested and works. May need some slight modifications.