How to delete a column/columns of a CSV file which has cell values with a string enclosed in double quotes

时光毁灭记忆、已成空白 提交于 2019-12-10 07:22:34

问题


How can I delete a column from a CSV file which has comma separated value with a string enclosed in double quotes and a comma in between? I have a file 44.csv with 4 lines including the header like the below format:

column1, column2, column3, column 4, column5, column6
12,455,"string with quotes, and with a comma in between",4432,6787,890,88
4432,6787,"another, string with quotes, and with two comma in between",890,88,12,455
11,22,"simple string",77,777,333,22

I need to cut the 1,2,3 columns from the file, so I used the cut command as below

cut -d"," -f1,2,3 44.csv > 444.csv

I am getting the output as

column1, column2, column3
12,455,"string with quotes
4432,6787,"another string with quotes
11,22,"simple string"

But I need the output to be

column1, column2, column3
12,455,"string with quotes, and with a comma in between"
4432,6787,"another, string with quotes, and with two comma in between"
11,22,"simple string"

Any help is greatly appreciated.

Thanks Dhruuv.


回答1:


With GNU awk version 4 or later, you can use FPAT to define the patterns.

gawk '{print $1, $2, $3}' FPAT="([^,]+)|(\"[^\"]+\")" OFS="," 44.csv

Test:

$ gawk '{print $1, $2, $3}' FPAT="([^,]+)|(\"[^\"]+\")" OFS="," mycsv.csv
column1, column2, column3
12,455,"string with quotes, and with a comma in between"
4432,6787,"another, string with quotes, and with two comma in between"
11,22,"simple string"



回答2:


I had the same issue as you Dhruuv, the solution proposed by jaypal singh is correct but wasn't working for all my cases. I recommend you to use : https://github.com/dbro/csvquote (Enables common unix utlities like cut, head, tail to work correctly with csv data containing delimiters and newlines) this worked for me.




回答3:


You can probably do it with cut in this special case, by using " as your delimiter, but I'd strongly advise against it -- even if you could make it work in this case, you might later get a string with an escaped double quote in it, e.g. \" which would fool that too. Or, more of your columns might be quoted (which is a perfectly valid CSV-ism).

A smarter tool is required! The simplest to obtain might well be Perl and the Text::CSV module -- you've almost certainly got Perl installed, and depending on your environment installing Text::CSV as a package, with CPAN.pm, or with cpanminus ought to be straightforward.



来源:https://stackoverflow.com/questions/17199311/how-to-delete-a-column-columns-of-a-csv-file-which-has-cell-values-with-a-string

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!