问题
The other day I asked how to wrap values of the first line of a csv file with quotations. I was given this reply which worked great.
$ cat file.csv
word1,word2,word3,word4,word5
12345,12346,12347,12348,12349
To put quotes around the items in the first line only:
$ sed '1 { s/^/"/; s/,/","/g; s/$/"/ }' file.csv
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
I now need to test if the quotes exist around the values to eliminate chances of double quoting values.
回答1:
This problem suits awk more than sed
due to row/column processing:
awk 'BEGIN{FS=OFS=","} NR==1 {
for (i=1; i<=NF; i++) {gsub(/^"|"$/, "", $i); $i = "\"" $i "\""}
} 1' file
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
- Using
gsub
function we remove leading or trailing double quote, if it exists - Then we can safely wrap each cell in double quotes
回答2:
Change each of the substitutions to include optional quotes:
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv
I have added -E
to enable extended mode, so that ?
is understood to mean "0 or 1 match".
You could also keep on using basic mode (no -E
) and replace each ?
with either \{0,1\}
(again, 0 or 1 match) or *
(which matches 0 or more).
回答3:
Regular expressions with sed
and awk
are subject to a seemingly never-ending series of edge cases that fail. Leveraging a csv library instead provides a great deal more robustness.
I found Python's library was the best choice because it's:
- widely available without onerous dependencies, with the exception of Python itself;
- not particular sensitive to the version of Python you use;
- lends itself to being embedded within a shell script; and
- is quite compact (a one-liner will do!).
Thus, my solution is along the lines of:
QUOTE_CSV_PY='import sys; import csv; csv.writer(sys.stdout, quoting=csv.QUOTE_ALL).writerows(csv.reader(sys.stdin))'
head -1 file.csv | python -c "$QUOTE_CSV_PY"; tail -n +2 file.csv
To break it down:
QUOTE_CSV_PY
is a shell variable containing the Python one-liner commands- The Python commands simply import the standard
sys
andcsv
modules. It then creates a csv writer that writes tostdout
withQUOTE_ALL
set so all fields get quoted. It is fed a csv reader that reads fromstdin
. head -1
sends the first line to the python interpreter for processing.; tail -n +2
waits until the processing is done and then just dumps out every line from number two onwards.
回答4:
Keep your existing working sed command, by removing all possible double quotes first:
sed '1 { s/"//g; s/^/"/; s/,/","/g; s/$/"/ }' file.csv
回答5:
To test each answer I created three files:
file.csv
word1,word2,word3,word4,word5
12345,12346,12347,12348,12349
file2.csv
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
file3.csv
"word1",word2,word3,"word4",word5
12345,12346,12347,12348,12349
Then I created a bash script
#!/bin/bash
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file.csv > final.csv
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file2.csv > final2.csv
sed -E '1 { s/^"?/"/; s/"?,"?/","/g; s/"?$/"/ }' file3.csv > final3.csv
Then I looked at the final files and the first lines were perfect.
# cat final*.csv
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
"word1","word2","word3","word4","word5"
12345,12346,12347,12348,12349
来源:https://stackoverflow.com/questions/46287556/bash-how-to-wrap-values-of-the-first-line-of-a-csv-file-with-quotations-if-they