I have a set of csv files (around 250), each having 300 to 500 records. I need to cut 2 or 3 columns from each file and store it to another one. I\'m using ubuntu OS
If you used ssconvert
to get the CSV you might try:
ssconvert -O 'separator="|"' "file.xls" "file.txt"
Notice the TXT extension instead CSV, this way will use Gnumeric_stf:stf_assistant exporter instead of Gnumeric_stf:stf_csv, which let you use options (-O
parameter). Otherwise you'll get a The file saver does not take options error. Pipe character is much more unlikely, but you might want to check before.
Then you can rename it and do things like:
cat file.csv | cut -d "|" -f3 | sort | uniq -c | sort -rn | head
-O 'eol=unix separator=; format=preserve charset=UTF-8 locale=en_US transliterate-mode=transliterate quoting-mode=never'
.If you know that the column delimiter does not occur inside the fields, you can use cut.
$ cat in.csv
foo,bar,baz
qux,quux,quuux
$ cut -d, -f2,3 < in.csv
bar,baz
quux,quuux
You can use the shell buildin 'for' to loop over all input files.
If your fields contain commas or newlines, you can use a helper program I wrote to allow cut (and other UNIX text processing tools) to properly work with the data.
https://github.com/dbro/csvquote
This program finds special characters inside quoted fields, and temporarily replaces them with nonprinting characters which won't confuse the cut program. Then they get restored after cut is done.
lutz' solution would become:
csvquote in.csv | cut -d, -f2,3 | csvquote -u
If the fields might contain the delimiter, you ought to find a library that can parse CSV files. Typically, general purpose scripting languages will include a CSV module in their standard library.
Ruby: require 'csv'
Python: import csv
Perl: use Text::ParseWords;