extract rows and filenames from multiple csv files

后端未结

关注

 5  408

I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. The csv files have common headers. The csv file looks like this:

相关标签:

5条回答

误落风尘

2021-01-17 02:38

This should do the job:

import glob
import os

outfile = open('output.csv', 'w')
outfile.write('X ; Y ; Z ; filename\n')
for filename in glob.glob('*.csv'):
  if filename == 'output.csv': # Skip the file we're writing.
    continue
  with open(filename, 'r') as infile:
    count = 0 
    lineno = 0 
    for line in infile:
      lineno += 1
      if lineno == 1: # Skip the header line.
        continue
      fields = line.split(';')
      x = int(fields[0])
      y = int(fields[1])
      z = float(fields[2])
      if x == 1 and y == 2:
        outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
        count += 1
    if count == 0: # Handle the case when no lines were found.
      outfile.write('1 ; 2 ; NA ; %s\n' % filename)
outfile.close()

Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float.

0 讨论(0)

旧时难觅i

2021-01-17 02:40
This is a well-formed question, from which the logic should be apparent. For someone to provide finished code would defeat the purpose of the assignment. First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.

To get started, try:
```
import csv, os

for fn in os.listdir():
    if ".csv" in fn:
        with open(fn, 'r', newline='') as f:
            reader = csv.reader(f, delimiter=";")
            for row in reader:
                ...
```
Then extend the solution to open the output file and write the selected lines using csv.writer.
0 讨论(0)
发布评论:

提交评论
- 加载中...

野趣味

2021-01-17 02:41

The following should work:

import csv
with open('output.csv', 'w') as outfile:
    outfile.write('X ; Y ; Z ; filename\n')
    fmt = '1 ; 2 ; %s ; %s\n'
    files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
    for file in files:
        with open(file) as f:
            reader = csv.reader(f, delimiter=';')
            for row in reader:
                if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
                    outfile.write(fmt % (row[2].strip(), file[:-4]))
                    break
            else:
                outfile.write(fmt % ('NA', file[:-4]))

0 讨论(0)

感情败类

2021-01-17 02:47

if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing

if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)

0 讨论(0)
发布评论:

提交评论
- 加载中...

慢半拍i

2021-01-17 03:03

You could read in each file at a time. Read it line by line

files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
for f in files:
    file = open(f, 'r')
    for line in file:
        ray = line.split(';')
        if (ray[0].strip() == '1' and ray[1].strip() == '2'):
            fout = open('output.csv', 'a')
            fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
            fout.close()
    file.close()

Tested and works. May need some slight modifications.

0 讨论(0)