How do I extract column from CSV with quoted commas, using the shell?

懵懂的女人 提交于 2019-12-02 15:05:11

问题


I have a CSV file, but unlike in related questions, it has some columns containing double-quoted strings with commas, e.g.

foo,bar,baz,quux
11,"first line, second column",13.0,6
210,"second column of second line",23.1,5

(of course it's longer, and the number of quoted commas is not necessarily one or 0, nor is the text predictable.) The text might also have (escaped) double-quotes within double-quotes, or not have double-quotes altogether for a typically-quoted field. The only assumption we can make is that there are no quoted newlines, so we can split lines trivially using \n.

Now, I'd like to extract a specific column (say, the third one) - say, to be printed on standard output, one value per line. I can't simply use commas as field delimiters (and thus, e.g., use cut); rather, I need to something more sophisticated. What could that be?

Note: I'm using bash on a Linux system.


回答1:


Here is a quick and dirty Python csvcut. The Python csv library already knows everything about various CSV dialects etc so you just need a thin wrapper.

The first argument should express the index of the field you wish to extract, like

csvcut 3 sample.csv

to extract the third column from the (possibly, quoted etc) CSV file sample.csv.

#!/usr/bin/env python3

import csv
import sys

writer=csv.writer(sys.stdout)
# Python indexing is zero-based
col = 1+int(sys.argv[1])
for input in sys.argv[2:]:
    with open(input) as handle:
        for row in csv.reader(handle): 
            writer.writerow(row[col])

To do: error handling, extraction of multiple columns. (Not hard per se; use row[2:5] to extract columns 3, 4, and 5; but I'm too lazy to write a proper command-line argument parser.)



来源:https://stackoverflow.com/questions/52466382/how-do-i-extract-column-from-csv-with-quoted-commas-using-the-shell

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!