Bash How to wrap values of the first line of a csv file with quotations, if they do not exist

问题

The other day I asked how to wrap values of the first line of a csv file with quotations. I was given this reply which worked great.

$ cat file.csv  
word1,word2,word3,word4,word5  
12345,12346,12347,12348,12349

To put quotes around the items in the first line only:

$ sed '1 { s/^/"/; s/,/","/g; s/$/"/ }' file.csv  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

I now need to test if the quotes exist around the values to eliminate chances of double quoting values.

回答1:

This problem suits awk more than sed due to row/column processing:

awk 'BEGIN{FS=OFS=","} NR==1 {
   for (i=1; i<=NF; i++) {gsub(/^"|"$/, "", $i); $i = "\"" $i "\""}
} 1' file

"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349

Using gsub function we remove leading or trailing double quote, if it exists
Then we can safely wrap each cell in double quotes

回答2:

Change each of the substitutions to include optional quotes:

sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv

I have added -E to enable extended mode, so that ? is understood to mean "0 or 1 match".

You could also keep on using basic mode (no -E) and replace each ? with either \{0,1\} (again, 0 or 1 match) or * (which matches 0 or more).

回答3:

Regular expressions with sed and awk are subject to a seemingly never-ending series of edge cases that fail. Leveraging a csv library instead provides a great deal more robustness.

I found Python's library was the best choice because it's:

widely available without onerous dependencies, with the exception of Python itself;
not particular sensitive to the version of Python you use;
lends itself to being embedded within a shell script; and
is quite compact (a one-liner will do!).

Thus, my solution is along the lines of:

QUOTE_CSV_PY='import sys; import csv; csv.writer(sys.stdout, quoting=csv.QUOTE_ALL).writerows(csv.reader(sys.stdin))'
head -1 file.csv | python -c "$QUOTE_CSV_PY"; tail -n +2 file.csv

To break it down:

QUOTE_CSV_PY is a shell variable containing the Python one-liner commands
The Python commands simply import the standard sys and csv modules. It then creates a csv writer that writes to stdout with QUOTE_ALL set so all fields get quoted. It is fed a csv reader that reads from stdin.
head -1 sends the first line to the python interpreter for processing.
; tail -n +2 waits until the processing is done and then just dumps out every line from number two onwards.

回答4:

Keep your existing working sed command, by removing all possible double quotes first:

sed '1 { s/"//g; s/^/"/; s/,/","/g; s/$/"/ }' file.csv

回答5:

To test each answer I created three files:

file.csv

word1,word2,word3,word4,word5  
12345,12346,12347,12348,12349

file2.csv

"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

file3.csv

"word1",word2,word3,"word4",word5  
12345,12346,12347,12348,12349

Then I created a bash script

#!/bin/bash  

sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv > final.csv  
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file2.csv > final2.csv  
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file3.csv > final3.csv

Then I looked at the final files and the first lines were perfect.

# cat final*.csv  

"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349  
"word1","word2","word3","word4","word5"  
12345,12346,12347,12348,12349

来源：https://stackoverflow.com/questions/46287556/bash-how-to-wrap-values-of-the-first-line-of-a-csv-file-with-quotations-if-they

标签

bash

csv

if-statement

quotations