The intent of this question is to provide a canonical answer.
Given a CSV as might be generated by Excel or other tools with embedded newlines, embedded double quotes
An improvement upon @EdMorton's FPAT
solution, which should be able to handle double-quotes("
) escaped by doubling (""
-- as allowed by the CSV standard).
gawk -v FPAT='[^,]*|("[^"]*")+' ...
This STILL
isn't able to handle newlines inside quoted fields, which are perfectly legit in standard CSV files.
assumes GNU awk (gawk
), a standard awk won't do.
Example:
$ echo 'a,,"","y""ck","""x,y,z"," ",12' |
gawk -v OFS='|' -v FPAT='[^,]*|("[^"]*")+' '{$1=$1}1'
a||""|"y""ck"|"""x,y,z"|" "|12
$ echo 'a,,"","y""ck","""x,y,z"," ",12' |
gawk -v FPAT='[^,]*|("[^"]*")+' '{
for(i=1; i<=NF;i++){
if($i~/"/){ $i = substr($i, 2, length($i)-2); gsub(/""/,"\"", $i) }
print "<"$i">"
}
}'
<>
<>
<"x,y,z>
< >
<12>