I don\'t seem to locate an SO question that matches this exact problem.
I have a text file that has one text token per line, without any commas, tabs, or quotes. I want
One way with Awk
would be to reset the RS
and treat the records as separated by blank lines. This would handle words with spaces and format them in CSV format as expected.
awk '{$1=$1}1' FS='\n' OFS=',' RS= file
The {$1=$1}
is a way to reconstruct the fields in each line($0
) of the file based on modifications to Field (FS/OFS
) and/or Record separators(RS/ORS
). The trailing 1
is to print every line with the modifications done inside {..}
.
With Perl one-liner:
$ cat csv_2_text
one
two
three
$ perl -ne '{ chomp; push(@lines,$_) } END { $x=join(",",@lines); print "$x" }' csv_2_text
one,two,three
$ perl -ne ' { chomp; $_="$_," if not eof ;printf("%s",$_) } ' csv_2_text
one,two,three
$
From @codeforester
$ perl -ne 'BEGIN { my $delim = "" } { chomp; printf("%s%s", $delim, $_); $delim="," } END { printf("\n") }' csv_2_text
one,two,three
$
The usual command to do this is paste
csv_string=$(paste -sd, file.txt)
You can do it entirely with bash parameter expansion operators instead of using tr
and sed
.
csv_string=$(<file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_string%,} # remove trailing comma
Tested the four approaches on a Linux box - Bash only, paste, awk, Perl, as well as the tr | sed
approach shown in the question:
#!/bin/bash
# generate test data
seq 1 10000 > test.file
times=${1:-50}
printf '%s\n' "Testing paste solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(paste -sd, test.file)
done
}
printf -- '----\n%s\n' "Testing pure Bash solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(<test.file) # read file into variable
csv_string=${csv_string//$'\n'/,} # replace \n with ,
csv_string=${csv_strings%,} # remove trailing comma
done
}
printf -- '----\n%s\n' "Testing Awk solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(awk '{$1=$1}1' FS='\n' OFS=',' RS= test.file)
done
}
printf -- '----\n%s\n' "Testing Perl solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(perl -ne '{ chomp; $_="$_," if not eof; printf("%s",$_) }' test.file)
done
}
printf -- '----\n%s\n' "Testing tr | sed solution"
time {
for ((i=0; i < times; i++)); do
csv_string=$(tr '\n' ',' < test.file | sed 's/,$//')
done
}
Surprisingly, the Bash only solution does quite poorly. paste
comes on top, followed by tr | sed
, Awk
, and perl
:
Testing paste solution
real 0m0.109s
user 0m0.052s
sys 0m0.075s
----
Testing pure Bash solution
real 1m57.777s
user 1m57.113s
sys 0m0.341s
----
Testing Awk solution
real 0m0.221s
user 0m0.152s
sys 0m0.077s
----
Testing Perl solution
real 0m0.424s
user 0m0.388s
sys 0m0.080s
----
Testing tr | sed solution
real 0m0.162s
user 0m0.092s
sys 0m0.141s
For some reasons, csv_string=${csv_string//$'\n'/,}
hangs on macOS Mojave running Bash 4.4.23.
Related posts: