问题
Sample.csv Data
"2-Keyw-Bllist, TerrorViolencetest",vodka,ZETA+GLOBAL 4(ID: ZETA+GLOBAL),,105629,523,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,Captify (ID: Captify),,94676,884,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,QuantCast (ID: QuantCast),,46485,786,flag
TerrorViolencetest,germany,QuantCast (ID: QuantCast),,31054,491,flag
EY-Keyword-Blacklist,BBQ,MIQ+RON (ID: MIQ+RON),,26073,149,flag
TerrorViolencetest,chips,Captify (ID: Captify),,23737,553,flag
"2-Keyw-Bllist, TerrorViolencetest",bacon,QuantCast (ID: QuantCast),,17461,241,flag
VurityAdult-1test,cracks,Captify (ID: Captify),,17325,358,flag
VurityAdult-1test,pizza+grills,Captify (ID: Captify),,15173,41,flag
Desired Output
"2-Keyw-Bllist, TerrorViolencetest",vodka,ZETA+GLOBAL (ID: ZETA+GLOBAL),105629,523,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,Captify (ID: Captify),94676,884,flag
"2-Keyw-Bllist, TerrorViolencetest",vodka,QuantCast (ID: QuantCast),46485,786,flag
TerrorViolencetest,germany,QuantCast (ID: QuantCast),31054,491,flag
EY-Keyword-Blacklist,BBQ,MIQ+RON (ID: MIQ+RON),26073,149,flag
TerrorViolencetest,chips,Captify (ID: Captify),23737,553,flag
"2-Keyw-Bllist, TerrorViolencetest",bacon,QuantCast (ID: QuantCast),17461,241,flag
VurityAdult-1test,cracks,Captify (ID: Captify),17325,358,flag
VurityAdult-1test,pizza+grills,Captify (ID: Captify),15173,41,flag
Issue
I have 7 columns of data and am able to print out/ remove the columns needed for the desired output (if it worked). However due to the hard commas in the middle of my data in column 1, awk
believes I have more than 7 columns and when I try to remove column 4 it removes values it shouldn't as some rows get shunted over into columns they shouldn't be.
What I've Tried
- I've tried to pipe results from
csvtool
(which is able to confirm columns correctly) and usesed/awk
to sub the commas for something else. This fails I guess due to the fact the other commands don't understand what csvtool knows for columns. - I've tried to work awk's
FPAT
but from what I was able to work out online, I can get my desired output but my script only seems to print the final row of my data.
awk -F"," -v OFS=',' 'BEGIN {FPAT = "([^,]*)|(\"[^\"]+\")"} END {print $1,$2,$3,$5,$6,$7}' sample.csv
Does anyone know of an easier way to get my columns understood for when I remove columns or is FPAT
the only way to go with this and I'm missing something with what I've done?
回答1:
Could you please try following.
awk -F"," -v OFS=',' 'BEGIN{FPAT="([^,]*)|(\"[^\"]+\")"} {print $1,$2,$3,$5,$6,$7}' Input_file
OR make better use of BEGIN
:)
awk 'BEGIN{FS=OFS=",";FPAT="([^,]*)|(\"[^\"]+\")"} {print $1,$2,$3,$5,$6,$7}' Input_file
Reason why OP's code is partially working: Since you are using END
block and printing everything there that is the reason it is printing last row (though this behavior is not defined in few of awk
AFAIK). How END
block works is:
There are 3 main BLOCKS in awk
:
BEGIN
BLOCK: Which runs before any Input_file is being read, it is important when you want to initialize variables we can do it before program starts reading actual Input_file.{...}
main BLOCK: Now comes the main block where all Input_file records(lines) will be read.END
BLOCK:END
block of anyawk
program is executed once program is done with reading whole Input_file, so all kind of calculations eg--> with arrays, printing last values after processing of complete Input_file will be done here.
What man awk
says:
Finally, after all the input is exhausted, gawk executes the code in the END rule(s) (if any).
来源:https://stackoverflow.com/questions/59127981/awk-fpat-to-ignore-commas-in-csv