I have a really huge CSV files. There are about 1700 columns and 40000 rows like below:
x,y,z,x1,x2,x3,x4,x5,x6,x7,x8,x9,...(about 1700 more)...,x1700
0,0,0,
Use a small python script like:
fin = 'file_in.csv'
fout1 = 'file_out1.csv'
fout1_fd = open(fout1,'w')
...
lines = []
with open(fin) as fin_fd:
lines = fin_fd.read().split('\n')
for l in lines:
l_arr = l.split(',')
fout1_fd.write(','.join(l_arr[0:3]))
fout1_fd.write('\n')
...
...
fout1_fd.close()
...
A one-line solution for your example data and desired output:
cut -d, -f -3 huge.csv > file1.csv
cut -d, -f 4-1004 huge.csv > file2.csv
cut -d, -f 1005- huge.csv > file3.csv
The cut
program is available on most POSIX platforms and is part of GNU Core Utilities. There is also a Windows version.
update in python, since the OP asked for a program in an acceptable language:
# python 3 (or python 2, if you must)
import csv
import fileinput
output_specifications = ( # csv file name, selector function
('file1.csv', slice(3)),
('file2.csv', slice(3, 1003)),
('file3.csv', slice(1003, 1703)),
)
output_row_writers = [
(
csv.writer(open(file_name, 'wb'), quoting=csv.QUOTE_MINIMAL).writerow,
selector,
) for file_name, selector in output_specifications
]
reader = csv.reader(fileinput.input())
for row in reader:
for row_writer, selector in output_row_writers:
row_writer(row[selector])
This works with the sample data given and can be called with the input.csv
as an argument or by piping from stdin.
You can open the file in Microsoft Excel, delete the extra columns, save as csv for file #1. Repeat the same procedure for the other 2 tables.
I usually use open office ( or microsof excel in case you are using windows) to do that without writing any program and change the file and save it. Following are two useful links showing how to do that.
https://superuser.com/questions/407082/easiest-way-to-open-csv-with-commas-in-excel
http://office.microsoft.com/en-us/excel-help/import-or-export-text-txt-or-csv-files-HP010099725.aspx