What I want to do:
What I want to do is that I have a big .csv file. I want to break down this big csv file into many small files based on the common records in BB colum
For the data you have provided, the following script will produce your requested output files. It will perform this operation on ALL CSV files found in the folder:
from itertools import groupby
import glob
import csv
import os
def remove_unwanted(rows):
return [['' if col == 'NULL' else col for col in row[2:]] for row in rows]
output_folder = 'temp' # make sure this folder exists
# Search for ALL CSV files in the current folder
for csv_filename in glob.glob('*.csv'):
with open(csv_filename) as f_input:
basename = os.path.splitext(os.path.basename(csv_filename))[0] # e.g. bigfile
csv_input = csv.reader(f_input)
header = next(csv_input)
# Create a list of entries with '0' in last column
id_list = remove_unwanted(row for row in csv_input if row[7] == '0')
f_input.seek(0) # Go back to the start
header = remove_unwanted([next(csv_input)])
for k, g in groupby(csv_input, key=lambda x: x[1]):
if k == '':
break
# Format an output file name in the form 'bigfile_53.csv'
file_name = os.path.join(output_folder, '{}_{}.csv'.format(basename, k))
with open(file_name, 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(header)
csv_output.writerows(remove_unwanted(g))
csv_output.writerows(id_list)
This will result in the files bigfile_53.csv
, bigfile_59.csv
and bigfile_61.csv
being created in an output folder called temp
. For example bigfile_53.csv
will appear as follows:
Entries containing the string 'NULL' will be converted to an empty string, and the first two columns will be removed (as per OP's comment).
Tested in Python 2.7.9