What I want to do:
What I want to do is that I have a big .csv file. I want to break down this big csv file into many small files based on the common records in BB colum
For the data you have provided, the following script will produce your requested output files. It will perform this operation on ALL CSV files found in the folder:
from itertools import groupby
import glob
import csv
import os
def remove_unwanted(rows):
return [['' if col == 'NULL' else col for col in row[2:]] for row in rows]
output_folder = 'temp' # make sure this folder exists
# Search for ALL CSV files in the current folder
for csv_filename in glob.glob('*.csv'):
with open(csv_filename) as f_input:
basename = os.path.splitext(os.path.basename(csv_filename))[0] # e.g. bigfile
csv_input = csv.reader(f_input)
header = next(csv_input)
# Create a list of entries with '0' in last column
id_list = remove_unwanted(row for row in csv_input if row[7] == '0')
f_input.seek(0) # Go back to the start
header = remove_unwanted([next(csv_input)])
for k, g in groupby(csv_input, key=lambda x: x[1]):
if k == '':
break
# Format an output file name in the form 'bigfile_53.csv'
file_name = os.path.join(output_folder, '{}_{}.csv'.format(basename, k))
with open(file_name, 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerows(header)
csv_output.writerows(remove_unwanted(g))
csv_output.writerows(id_list)
This will result in the files bigfile_53.csv
, bigfile_59.csv
and bigfile_61.csv
being created in an output folder called temp
. For example bigfile_53.csv
will appear as follows:
Entries containing the string 'NULL' will be converted to an empty string, and the first two columns will be removed (as per OP's comment).
Tested in Python 2.7.9
You should look into the csv module. You can read your input file line by line and sort each line according to the BB column. This should be easy to do with a dictionary who's keys are the value in the BB column and the values are a list containing the information from that row. You can then write these lists to csv files using the csv module.