How can I break down a large csv file into small files based on common records by python

前端 未结 2 1173
谎友^
谎友^ 2021-01-29 14:18

What I want to do:

What I want to do is that I have a big .csv file. I want to break down this big csv file into many small files based on the common records in BB colum

相关标签:
2条回答
  • 2021-01-29 14:50

    For the data you have provided, the following script will produce your requested output files. It will perform this operation on ALL CSV files found in the folder:

    from itertools import groupby
    import glob
    import csv
    import os
    
    def remove_unwanted(rows):
        return [['' if col == 'NULL' else col for col in row[2:]] for row in rows]
    
    output_folder = 'temp'  # make sure this folder exists
    
    # Search for ALL CSV files in the current folder
    for csv_filename in glob.glob('*.csv'):
        with open(csv_filename) as f_input:
            basename = os.path.splitext(os.path.basename(csv_filename))[0]      # e.g. bigfile
    
            csv_input = csv.reader(f_input)
            header = next(csv_input)
            # Create a list of entries with '0' in last column
            id_list = remove_unwanted(row for row in csv_input if row[7] == '0')
            f_input.seek(0)     # Go back to the start
            header = remove_unwanted([next(csv_input)])
    
            for k, g in groupby(csv_input, key=lambda x: x[1]):
                if k == '':
                    break
    
                # Format an output file name in the form 'bigfile_53.csv'
                file_name = os.path.join(output_folder, '{}_{}.csv'.format(basename, k))
    
                with open(file_name, 'wb') as f_output:
                    csv_output = csv.writer(f_output)
                    csv_output.writerows(header)
                    csv_output.writerows(remove_unwanted(g))
                    csv_output.writerows(id_list)
    

    This will result in the files bigfile_53.csv, bigfile_59.csv and bigfile_61.csv being created in an output folder called temp. For example bigfile_53.csv will appear as follows:

    Entries containing the string 'NULL' will be converted to an empty string, and the first two columns will be removed (as per OP's comment).

    Tested in Python 2.7.9

    0 讨论(0)
  • 2021-01-29 15:13

    You should look into the csv module. You can read your input file line by line and sort each line according to the BB column. This should be easy to do with a dictionary who's keys are the value in the BB column and the values are a list containing the information from that row. You can then write these lists to csv files using the csv module.

    0 讨论(0)
提交回复
热议问题