I\'m looking for a simple way to print a specific field with awk while allowing for embedded spaces in the field.
Sample: Field1 Field2 \"Field Three\" Field
You can do this if the double quotes are always there:
awk -F\" '{print $2}'
Specifically, I am telling awk
that the fields are separated by double quotes, at which point the part you want is readily available as field 2.
If you need to get at subsequent fields, you can split the remainder of the line on spaces and get a new array, say F[]
of fields, like this:
awk -F\" '{split($3,F," ");print $2,F[1],F[2]}' file
Field Three Field4 Field5
assuming your file looks like this:
Field1 Field2 "Field Three" Field4 Field5 Field6
Based on this, in gawk
maybe you can use something like
awk 'BEGIN{FPAT = "([^ ]+)|(\"[^\"]+\")"}{print $3}' input.txt
Output:
"Field Three"
It may need more work to get suited to your needs completely.
I think it needs gawk
4+, https://lists.gnu.org/archive/html/info-gnu/2011-06/msg00013.html
Parsing CSV can be a tricky business. I like to use a language with a proper CSV parsing module. For example with ruby, parsing the given line, using space as the column separator, and default double quotes quoting character:
ruby -rcsv -ne 'row = CSV.parse_line($_, {:col_sep=>" "}); puts row[2]' <<END
Field1 Field2 "Field Three" Field4
END
Field Three
Mark Setchell's answer is good, although it will not work if you don't know in advance how many embedded quotes you have (and it doesn't split on spaces anymore).
I hacked this together (obviously it can be improved):
gawk -v FIELD=2 '{ a=$ FIELD; if (substr(a, 0, 1) == "\"") { gsub(/^\"/, "", a); s=a; for (i = FIELD + 1; i <= NF; i++) { a=$ i; nbSub=gsub(/\"$/, "", a); s = s " " a; if (nbSub > 0) { break } } print(s) } }' <<<'allo "hello world" bar'
I would recommend using something else than gawk for this (maybe look into parsing the fields with your shell's IFS variable?).
Addendum: As I said above, this is not really the right tool for the job. For example, you can specify the first field with the -v FIELD=, but it counts fields based on AWK's separator (the embedded spaces are still counted).