I have a CSV file, but unlike in related questions, it has some columns containing double-quoted strings with commas, e.g.
foo,bar,baz,quux
11,\"first line, seco
Here is a quick and dirty Python csvcut
. The Python csv library already knows everything about various CSV dialects etc so you just need a thin wrapper.
The first argument should express the index of the field you wish to extract, like
csvcut 3 sample.csv
to extract the third column from the (possibly, quoted etc) CSV file sample.csv
.
#!/usr/bin/env python3
import csv
import sys
writer=csv.writer(sys.stdout)
# Python indexing is zero-based
col = 1+int(sys.argv[1])
for input in sys.argv[2:]:
with open(input) as handle:
for row in csv.reader(handle):
writer.writerow(row[col])
To do: error handling, extraction of multiple columns. (Not hard per se; use row[2:5]
to extract columns 3, 4, and 5; but I'm too lazy to write a proper command-line argument parser.)