how to get awk, cut, etc. to not peek for FS inside quoted strings

天涯浪子 提交于 2019-12-10 19:50:12

问题


I have an input CSV file containing something like:

SD-32MM-1001,"100.00",4/11/2012
SD-32MM-1001,"1,000.00",4/12/2012

I need to take out the formatting of numerical values for some other processing pipeline (postgresql COPY).

Is there a text filter that will separate out the columns on FS without peeking inside quoted strings? Presently I get:

$ tail +2 /tmp/foo.csv|awk -F, '{print NF}'
3
4

And similarly partial values for cut

I have to stay on Linux.

Thanks.


回答1:


GNU awk can handle this you just need to set FPAT to describe what you consider a field:

$ awk '{print NF}' FPAT="([^,]+)|(\"[^\"]+\")" file
3
3

$ awk '{print $2}' FPAT="([^,]+)|(\"[^\"]+\")" file
"100.00"
"1,000.00"



回答2:


Using a perl script and a proper parser (the good solution: awk & cut are not suited for this particular needs):

use strict; use warnings;

use Text::CSV;

my @rows;
my $csv = Text::CSV->new ()
                or die "Cannot use CSV: ".Text::CSV->error_diag ();

open my $fh, "<:encoding(utf8)", "/tmp/file.csv" or die "$!";
while ( my $row = $csv->getline( $fh ) ) {

    # printing line 2, last field
    $. == 2 and print $row->[-1];
}
$csv->eof or $csv->error_diag();
close $fh;

Output

4/12/2012



回答3:


The suggestions from sudo_O should work -- unless your fields have double quotes inside them, which can happen in standard CSV data, eg.

field1,field2,"field,3","field4 ""has some quotes"" in it",field5

To handle these, you can wrap the standard UNIX commands like cut,awk,etc with a program I wrote called csvquote like this:

csvquote /tmp/foo.csv | tail +2 | awk -F, '{print NF}'

This works by finding the commas inside quoted fields and replacing them temporarily with nonprinting characters that awk can safely handle. Then when you want to create output from the fields, the pipeline will need to restore those commas:

csvquote /tmp/foo.csv | cut -d, -f2 | csvquote -u

You can find the code here: https://github.com/dbro/csvquote



来源:https://stackoverflow.com/questions/16110601/how-to-get-awk-cut-etc-to-not-peek-for-fs-inside-quoted-strings

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!