How can I break down a large csv file into small files based on common records by python

前端 未结 2 1172
谎友^
谎友^ 2021-01-29 14:18

What I want to do:

What I want to do is that I have a big .csv file. I want to break down this big csv file into many small files based on the common records in BB colum

2条回答
  •  日久生厌
    2021-01-29 14:50

    For the data you have provided, the following script will produce your requested output files. It will perform this operation on ALL CSV files found in the folder:

    from itertools import groupby
    import glob
    import csv
    import os
    
    def remove_unwanted(rows):
        return [['' if col == 'NULL' else col for col in row[2:]] for row in rows]
    
    output_folder = 'temp'  # make sure this folder exists
    
    # Search for ALL CSV files in the current folder
    for csv_filename in glob.glob('*.csv'):
        with open(csv_filename) as f_input:
            basename = os.path.splitext(os.path.basename(csv_filename))[0]      # e.g. bigfile
    
            csv_input = csv.reader(f_input)
            header = next(csv_input)
            # Create a list of entries with '0' in last column
            id_list = remove_unwanted(row for row in csv_input if row[7] == '0')
            f_input.seek(0)     # Go back to the start
            header = remove_unwanted([next(csv_input)])
    
            for k, g in groupby(csv_input, key=lambda x: x[1]):
                if k == '':
                    break
    
                # Format an output file name in the form 'bigfile_53.csv'
                file_name = os.path.join(output_folder, '{}_{}.csv'.format(basename, k))
    
                with open(file_name, 'wb') as f_output:
                    csv_output = csv.writer(f_output)
                    csv_output.writerows(header)
                    csv_output.writerows(remove_unwanted(g))
                    csv_output.writerows(id_list)
    

    This will result in the files bigfile_53.csv, bigfile_59.csv and bigfile_61.csv being created in an output folder called temp. For example bigfile_53.csv will appear as follows:

    Entries containing the string 'NULL' will be converted to an empty string, and the first two columns will be removed (as per OP's comment).

    Tested in Python 2.7.9

提交回复
热议问题