extract rows and filenames from multiple csv files

后端 未结 5 408
醉酒成梦
醉酒成梦 2021-01-17 02:28

I have multiple csv files with date as filename (20080101.csv to 20111031.csv) in a folder. The csv files have common headers. The csv file looks like this:



        
相关标签:
5条回答
  • 2021-01-17 02:38

    This should do the job:

    import glob
    import os
    
    outfile = open('output.csv', 'w')
    outfile.write('X ; Y ; Z ; filename\n')
    for filename in glob.glob('*.csv'):
      if filename == 'output.csv': # Skip the file we're writing.
        continue
      with open(filename, 'r') as infile:
        count = 0 
        lineno = 0 
        for line in infile:
          lineno += 1
          if lineno == 1: # Skip the header line.
            continue
          fields = line.split(';')
          x = int(fields[0])
          y = int(fields[1])
          z = float(fields[2])
          if x == 1 and y == 2:
            outfile.write('%d ; %d ; %g ; %s\n' % (x, y, z, filename))
            count += 1
        if count == 0: # Handle the case when no lines were found.
          outfile.write('1 ; 2 ; NA ; %s\n' % filename)
    outfile.close()
    

    Note that if you can't control or trust the file format you may want to handle exceptions thrown by the conversions to int/float.

    0 讨论(0)
  • 2021-01-17 02:40

    This is a well-formed question, from which the logic should be apparent. For someone to provide finished code would defeat the purpose of the assignment. First, add a "homework" tag to the question, then think about what you want to do: 1) loop over the files (keeping track of each filename as it's opened) 2) read lines from the current file 3) if the selection criteria (x==1 and y==2) is met, then write the line.

    To get started, try:

    import csv, os
    
    for fn in os.listdir():
        if ".csv" in fn:
            with open(fn, 'r', newline='') as f:
                reader = csv.reader(f, delimiter=";")
                for row in reader:
                    ...
    

    Then extend the solution to open the output file and write the selected lines using csv.writer.

    0 讨论(0)
  • 2021-01-17 02:41

    The following should work:

    import csv
    with open('output.csv', 'w') as outfile:
        outfile.write('X ; Y ; Z ; filename\n')
        fmt = '1 ; 2 ; %s ; %s\n'
        files = ['20080101.csv', '20080102.csv', '20080103.csv', '20080104.csv']
        for file in files:
            with open(file) as f:
                reader = csv.reader(f, delimiter=';')
                for row in reader:
                    if len(row) > 2 and row[0].strip() == '1' and row[1].strip() == '2':
                        outfile.write(fmt % (row[2].strip(), file[:-4]))
                        break
                else:
                    outfile.write(fmt % ('NA', file[:-4]))
    
    0 讨论(0)
  • 2021-01-17 02:47

    if you know that you have one file for each day, no missing day, then i'd use glob('*.csv') to get list of file names, open one bye one, then read like Tyler is doing

    if you konw that there are days where file is missing i'd use datetime to star with datetime.date(2008,1,1) and loop incrementing by one day. then for each of day i compose file name using .strftime() + '.csv', and try process file (if no file, just write a recode with NA)

    0 讨论(0)
  • 2021-01-17 03:03

    You could read in each file at a time. Read it line by line

    files = ['20080101.csv', '20080102.csv', '20080103.csv'] #...etc
    for f in files:
        file = open(f, 'r')
        for line in file:
            ray = line.split(';')
            if (ray[0].strip() == '1' and ray[1].strip() == '2'):
                fout = open('output.csv', 'a')
                fout.write(ray[0].strip() + ' ; ' + ray[1].strip() + ' ; ' + ray[2].strip() + ' ; ' + f + '\n')
                fout.close()
        file.close()
    

    Tested and works. May need some slight modifications.

    0 讨论(0)
提交回复
热议问题