bash method to remove last 4 columns from csv file

会有一股神秘感。 提交于 2019-11-30 18:39:05

Cut can do this if all lines have the same number of fields or awk if you don't.

cut -d, -f1-6 # assuming 10 fields

Will print out the first 6 fields if you want to control the output seperater use --output-delimiter=string

awk -F , -v OFS=, '{ for (i=1;i<=NF-4;i++){ printf $i, }; printf "\n"}'

Loops over fields up to th number of fields -4 and prints them out.

cat data.csv | rev | cut -d, -f-5 | rev

rev reverses the lines, so it doesn't matter if all the rows have the same number of columns, it will always remove the last 4. This only works if the last 4 columns don't contain any commas themselves.

You can use cut for this if you know the number of columns. For example, if your file has 9 columns, and comma is your delimiter:

cut -d',' -f -5

However, this assumes the data in your csv file does not contain any commas. cut will interpret commas inside of quotes as delimiters also.

YH Wu
awk -F, '{NF-=4; OFS=","; print}' file.csv

or alternatively

awk -F, -vOFS=, '{NF-=4;print}' file.csv

will drop the last 4 columns from each line.

awk one-liner:

awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'  file.csv

the advantage of using awk over cut is, you don't have to count how many columns do you have, and how many columns you want to keep. Since what you want is removing last 4 columns.

see the test:

kent$  seq 40|xargs -n10|sed 's/ /, /g'           
1, 2, 3, 4, 5, 6, 7, 8, 9, 10
11, 12, 13, 14, 15, 16, 17, 18, 19, 20
21, 22, 23, 24, 25, 26, 27, 28, 29, 30
31, 32, 33, 34, 35, 36, 37, 38, 39, 40

kent$  seq 40|xargs -n10|sed 's/ /, /g' |awk -F, '{for(i=0;++i<=NF-5;)printf $i", ";print $(NF-4)}'
1,  2,  3,  4,  5,  6
11,  12,  13,  14,  15,  16
21,  22,  23,  24,  25,  26
31,  32,  33,  34,  35,  36

This might work for you (GNU sed):

sed -r 's/(,[^,]*){4}$//' file

This awk solution in a hacked way

awk -F, 'OFS=","{for(i=NF; i>=NF-4; --i) {$i=""}}{gsub(",,,,,","",$0);print $0}' temp.txt

None of the mentioned methods will work properly when having CVS files with quoted fields with a <comma> character. So it is a bit hard to just use the <comma>-character as a field separator.

The following two posts are now very handy:

Since you work with GNU awk, you can thus do any of the following two:

$ awk -v FPAT='[^,]*|"[^"]+"' -v OFS="," 'NF{NF-=4}1'

Or with any awk, you could do:

$ awk 'BEGIN{ere="([^,]*|\042[^\042]+\042)"
             ere=","ere","ere","ere","ere"$"
       }
       {sub(ere,"")}1'
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!