I need to delete all the rows in a csv file which have more than a certain number of columns.
This happens because sometimes the code, which generates the csv file, skips some values and prints the following on the same line.
Example: Consider the following file to parse. I want to remove all the rows which have more than 3 columns (i.e. the columns of the header):
timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
3,4,4val2,4val3
5val1,5val2,5val3
6,6val2,6val3
The output file I would like to have is:
timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
5val1,5val2,5val3
6,6val2,6val3
I don't care if the row with timestamp 4 is missing.
I would prefer a solution in bash or perhaps using awk, rather than a python one, so that I can learn how to use it.
This can be done straight forward with awk
:
awk -F, 'NF<=3' file
This uses the awk
variable NF
that holds the number of fields in the current line. Since we have set the field separator to the comma (with -F,
or, equivalent, -v FS=","
), then it is just a matter of checking when the number of fields is not higher than 3. This is done with NF<=3
: when this is true, the line will be printed automatically.
Test
$ awk -F, 'NF<=3' a
timestamp,header2,header3
1,1val2,1val3
2,2val2,2val3
5val1,5val2,5val3
6,6val2,6val3
Try the following (do not omit to replace your file path and your max column):
#! /bin/bash
filepath=test.csv
max_columns=3
for line in $(cat $filepath);
do
count=$(echo "$line" | grep -o "," | wc -l)
if [ $(($count + 1)) -le $max_columns ]
then
echo $line
fi
done
Copy this in a .sh
file (cropper.sh for example), make it executable chmod +x cropper.sh
and run ./cropper.sh
).
This will output only the valid lines. You can then catch the result in a file this way:
./cropper.sh > result.txt
来源:https://stackoverflow.com/questions/29411952/delete-row-which-has-more-than-x-columns-in-a-csv