awk FPAT to ignore commas in csv

别说谁变了你拦得住时间么 提交于 2021-02-10 17:01:32

问题


Sample.csv Data

"2-Keyw-Bllist, TerrorViolencetest",vodka,ZETA+GLOBAL 4(ID: ZETA+GLOBAL),,105629,523,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,Captify (ID: Captify),,94676,884,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,QuantCast (ID: QuantCast),,46485,786,flag
TerrorViolencetest,germany,QuantCast (ID: QuantCast),,31054,491,flag
EY-Keyword-Blacklist,BBQ,MIQ+RON (ID: MIQ+RON),,26073,149,flag
TerrorViolencetest,chips,Captify (ID: Captify),,23737,553,flag
"2-Keyw-Bllist, TerrorViolencetest",bacon,QuantCast (ID: QuantCast),,17461,241,flag
VurityAdult-1test,cracks,Captify (ID: Captify),,17325,358,flag
VurityAdult-1test,pizza+grills,Captify (ID: Captify),,15173,41,flag

Desired Output

"2-Keyw-Bllist, TerrorViolencetest",vodka,ZETA+GLOBAL (ID: ZETA+GLOBAL),105629,523,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,Captify (ID: Captify),94676,884,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,QuantCast (ID: QuantCast),46485,786,flag
TerrorViolencetest,germany,QuantCast (ID: QuantCast),31054,491,flag
EY-Keyword-Blacklist,BBQ,MIQ+RON (ID: MIQ+RON),26073,149,flag
TerrorViolencetest,chips,Captify (ID: Captify),23737,553,flag
"2-Keyw-Bllist, TerrorViolencetest",bacon,QuantCast (ID: QuantCast),17461,241,flag
VurityAdult-1test,cracks,Captify (ID: Captify),17325,358,flag
VurityAdult-1test,pizza+grills,Captify (ID: Captify),15173,41,flag

Issue

I have 7 columns of data and am able to print out/ remove the columns needed for the desired output (if it worked). However due to the hard commas in the middle of my data in column 1, awk believes I have more than 7 columns and when I try to remove column 4 it removes values it shouldn't as some rows get shunted over into columns they shouldn't be.

What I've Tried

  • I've tried to pipe results from csvtool (which is able to confirm columns correctly) and use sed/awk to sub the commas for something else. This fails I guess due to the fact the other commands don't understand what csvtool knows for columns.
  • I've tried to work awk's FPAT but from what I was able to work out online, I can get my desired output but my script only seems to print the final row of my data.

awk -F"," -v OFS=',' 'BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")"} END {print $1,$2,$3,$5,$6,$7}' sample.csv

Does anyone know of an easier way to get my columns understood for when I remove columns or is FPAT the only way to go with this and I'm missing something with what I've done?


回答1:


Could you please try following.

awk -F"," -v OFS=',' 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")"} {print $1,$2,$3,$5,$6,$7}' Input_file

OR make better use of BEGIN :)

awk 'BEGIN{FS=OFS=",";FPAT="([^,]*)|(\"[^\"]+\")"} {print $1,$2,$3,$5,$6,$7}' Input_file

Reason why OP's code is partially working: Since you are using END block and printing everything there that is the reason it is printing last row (though this behavior is not defined in few of awk AFAIK). How END block works is:

There are 3 main BLOCKS in awk:

  1. BEGIN BLOCK: Which runs before any Input_file is being read, it is important when you want to initialize variables we can do it before program starts reading actual Input_file.
  2. {...} main BLOCK: Now comes the main block where all Input_file records(lines) will be read.
  3. END BLOCK: END block of any awk program is executed once program is done with reading whole Input_file, so all kind of calculations eg--> with arrays, printing last values after processing of complete Input_file will be done here.

What man awk says:

Finally, after all the input is exhausted, gawk executes the code in the END rule(s) (if any).



来源:https://stackoverflow.com/questions/59127981/awk-fpat-to-ignore-commas-in-csv

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!