Ignoring embeded spaces with AWK

后端 未结 4 1363
北海茫月
北海茫月 2020-12-20 09:00

I\'m looking for a simple way to print a specific field with awk while allowing for embedded spaces in the field.

Sample: Field1 Field2 \"Field Three\" Field

相关标签:
4条回答
  • 2020-12-20 09:11

    Parsing CSV can be a tricky business. I like to use a language with a proper CSV parsing module. For example with ruby, parsing the given line, using space as the column separator, and default double quotes quoting character:

    ruby -rcsv -ne 'row = CSV.parse_line($_, {:col_sep=>" "}); puts row[2]' <<END
    Field1 Field2 "Field Three" Field4
    END
    
    Field Three
    
    0 讨论(0)
  • 2020-12-20 09:18

    Based on this, in gawk maybe you can use something like

    awk 'BEGIN{FPAT = "([^ ]+)|(\"[^\"]+\")"}{print $3}' input.txt
    

    Output:

    "Field Three"
    

    It may need more work to get suited to your needs completely.

    I think it needs gawk 4+, https://lists.gnu.org/archive/html/info-gnu/2011-06/msg00013.html

    0 讨论(0)
  • 2020-12-20 09:26

    Mark Setchell's answer is good, although it will not work if you don't know in advance how many embedded quotes you have (and it doesn't split on spaces anymore).

    I hacked this together (obviously it can be improved):

    gawk -v FIELD=2 '{ a=$ FIELD; if (substr(a, 0, 1) == "\"") { gsub(/^\"/, "", a); s=a; for (i = FIELD + 1; i <= NF; i++) { a=$ i; nbSub=gsub(/\"$/, "", a); s = s " " a; if (nbSub > 0) { break } } print(s) } }' <<<'allo "hello world" bar'
    

    I would recommend using something else than gawk for this (maybe look into parsing the fields with your shell's IFS variable?).

    Addendum: As I said above, this is not really the right tool for the job. For example, you can specify the first field with the -v FIELD=, but it counts fields based on AWK's separator (the embedded spaces are still counted).

    0 讨论(0)
  • 2020-12-20 09:30

    You can do this if the double quotes are always there:

    awk -F\" '{print $2}'
    

    Specifically, I am telling awk that the fields are separated by double quotes, at which point the part you want is readily available as field 2.

    If you need to get at subsequent fields, you can split the remainder of the line on spaces and get a new array, say F[] of fields, like this:

    awk -F\" '{split($3,F," ");print $2,F[1],F[2]}' file
    
    Field Three Field4 Field5
    

    assuming your file looks like this:

    Field1 Field2 "Field Three" Field4 Field5 Field6
    
    0 讨论(0)
提交回复
热议问题