Delete row which has more than X columns in a csv

空扰寡人 提交于 2019-12-02 04:41:07

This can be done straight forward with awk:

awk -F, 'NF<=3' file

This uses the awk variable NF that holds the number of fields in the current line. Since we have set the field separator to the comma (with -F, or, equivalent, -v FS=","), then it is just a matter of checking when the number of fields is not higher than 3. This is done with NF<=3: when this is true, the line will be printed automatically.

Test

$ awk -F, 'NF<=3' a
timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
5val1,5val2,5val3
6,6val2,6val3

Try the following (do not omit to replace your file path and your max column):

#! /bin/bash

filepath=test.csv
max_columns=3

for line in $(cat $filepath);
do
    count=$(echo "$line" | grep -o "," | wc -l)
    if [ $(($count + 1)) -le $max_columns ]
            then
            echo $line
    fi
done

Copy this in a .sh file (cropper.sh for example), make it executable chmod +x cropper.sh and run ./cropper.sh).

This will output only the valid lines. You can then catch the result in a file this way:

./cropper.sh > result.txt

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!