Is there a way to use bash to remove the last four columns for some input CSV file? The last four columns can have fields that vary in length from line to line so it is not
None of the mentioned methods will work properly when having CVS files with quoted fields with a <comma> character. So it is a bit hard to just use the <comma>-character as a field separator.
The following two posts are now very handy:
Since you work with GNU awk, you can thus do any of the following two:
$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF-=4}1'
Or with any awk, you could do:
$ awk 'BEGIN{ere="([^,]*|\042[^\042]+\042)"
ere=","ere","ere","ere","ere"$"
}
{sub(ere,"")}1'
awk one-liner:
awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}' file.csv
the advantage of using awk over cut is, you don't have to count how many columns do you have, and how many columns you want to keep. Since what you want is removing last 4 columns.
see the test:
kent$ seq 40|xargs -n10|sed 's/ /, /g'
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11, 12, 13, 14, 15, 16, 17, 18, 19, 20
21, 22, 23, 24, 25, 26, 27, 28, 29, 30
31, 32, 33, 34, 35, 36, 37, 38, 39, 40
kent$ seq 40|xargs -n10|sed 's/ /, /g' |awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'
1, 2, 3, 4, 5, 6
11, 12, 13, 14, 15, 16
21, 22, 23, 24, 25, 26
31, 32, 33, 34, 35, 36
You can use cut
for this if you know the number of columns. For example, if your file has 9 columns, and comma is your delimiter:
cut -d',' -f -5
However, this assumes the data in your csv file does not contain any commas. cut
will interpret commas inside of quotes as delimiters also.
This might work for you (GNU sed):
sed -r 's/(,[^,]*){4}$//' file
Cut can do this if all lines have the same number of fields or awk if you don't.
cut -d, -f1-6 # assuming 10 fields
Will print out the first 6 fields if you want to control the output seperater use --output-delimiter=string
awk -F , -v OFS=, '{ for (i=1;i<=NF-4;i++){ printf $i, }; printf "\n"}'
Loops over fields up to th number of fields -4 and prints them out.
awk -F, '{NF-=4; OFS=","; print}' file.csv
or alternatively
awk -F, -vOFS=, '{NF-=4;print}' file.csv
will drop the last 4 columns from each line.