I have some TSV files that I need to convert to CSV files. Is there any solution in BASH, e.g. using awk
, to convert these? I could use sed
, like this,
Update: The following solutions are not generally robust, although they do work in the OP's specific use case; see the bottom section for a robust, awk
-based solution.
To summarize the options (interestingly, they all perform about the same):
tr:
devnull's solution (provided in a comment on the question) is the simplest:
tr '\t' ',' < file.tsv > file.csv
sed:
The OP's own sed
solution is perfectly fine, given that the input contains no quoted strings (with potentially embedded \t
chars.):
sed 's/\t/,/g' file.tsv > file.csv
The only caveat is that on some platforms (e.g., macOS) the escape sequence \t
is not supported, so a literal tab char. must be spliced into the command string using ANSI quoting ($'\t'
):
sed 's/'$'\t''/,/g' file.tsv > file.csv
awk:
The caveat with awk
is that FS
- the input field separator - must be set to \t
explicitly - the default behavior would otherwise strip leading and trailing tabs and replace interior spans of multiple tabs with only a single ,
:
awk 'BEGIN { FS="\t"; OFS="," } {$1=$1; print}' file.tsv > file.csv
Note that simply assigning $1
to itself causes awk
to rebuild the input line using OFS
- the output field separator; this effectively replaces all \t
chars. with ,
chars. print
then simply prints the rebuilt line.
Robust awk
solution:
As A. Rabus points out, the above solutions do not handle unquoted input fields that themselves contain ,
characters correctly - you'll end up with extra CSV fields.
The following awk
solution fixes this, by enclosing such fields in "..."
on demand (see the non-robust awk
solution above for a partial explanation of the approach).
If such fields also have embedded "
chars., these are escaped as ""
, in line with RFC 4180.Thanks, Wyatt Israel.
awk 'BEGIN { FS="\t"; OFS="," } {
rebuilt=0
for(i=1; i<=NF; ++i) {
if ($i ~ /,/ && $i !~ /^".*"$/) {
gsub("\"", "\"\"", $i)
$i = "\"" $i "\""
rebuilt=1
}
}
if (!rebuilt) { $1=$1 }
print
}' file.tsv > file.csv
$i ~ /[,"]/ && $i !~ /^".*"$/
detects any field that contains ,
and/or "
and isn't already enclosed in double quotes
gsub("\"", "\"\"", $i)
escapes embedded "
chars. by doubling them
$i = "\"" $i "\""
updates the result by enclosing it in double quotes
As stated before, updating any field causes awk
to rebuild the line from the fields with the OFS
value, i.e., ,
in this case, which amounts to the effective TSV -> CSV conversion; flag rebuilt
is used to ensure that each input record is rebuilt at least once.