发表新帖

发表新帖

What's the most robust way to efficiently parse CSV using awk?

后端未结

关注

 2  1965

花落未央 2020-11-21 06:16

The intent of this question is to provide a canonical answer.

Given a CSV as might be generated by Excel or other tools with embedded newlines, embedded double quotes

2条回答

天涯浪人 (楼主)

2020-11-21 06:52
An improvement upon @EdMorton's FPAT solution, which should be able to handle double-quotes(") escaped by doubling ("" -- as allowed by the CSV standard).
```
gawk -v FPAT='[^,]*|("[^"]*")+' ...
```
This STILL
1. isn't able to handle newlines inside quoted fields, which are perfectly legit in standard CSV files.
2. assumes GNU awk (gawk), a standard awk won't do.
Example:
```
$ echo 'a,,"","y""ck","""x,y,z"," ",12' |
gawk -v OFS='|' -v FPAT='[^,]*|("[^"]*")+' '{$1=$1}1'
a||""|"y""ck"|"""x,y,z"|" "|12

$ echo 'a,,"","y""ck","""x,y,z"," ",12' |
gawk -v FPAT='[^,]*|("[^"]*")+' '{
  for(i=1; i<=NF;i++){
    if($i~/"/){ $i = substr($i, 2, length($i)-2); gsub(/""/,"\"", $i) }
    print "<"$i">"
  }
}'

<>
<>

<"x,y,z>
< >
<12>
```
0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题