fastest way convert tab-delimited file to csv in linux

前端未结

关注

 11  1258

I have a tab-delimited file that has over 200 million lines. What\'s the fastest way in linux to convert this to a csv file? This file does have multiple lines of header i

相关标签:

11条回答

独厮守ぢ

2020-12-04 08:35
@ignacio-vazquez-abrams 's python solution is great! For people who are looking to parse delimiters other tab, the library actually allows you to set arbitrary delimiter. Here is my modified version to handle pipe-delimited files:
```
import sys
import csv

pipein = csv.reader(sys.stdin, delimiter='|')
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in pipein:
  commaout.writerow(row)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
夕颜

2020-12-04 08:35
assuming you don't want to change header and assuming you don't have embedded tabs
```
# cat file
header  header  header
one     two     three

$ awk 'NR>1{$1=$1}1' OFS="," file
header  header  header
one,two,three
```
NR>1 skips the first header. you mentioned you know how many lines of header, so use the correct number for your own case. with this, you also do not need to call any other external commands. just one awk command does the job.

another way if you have blank columns and you care about that.
```
awk 'NR>1{gsub("\t",",")}1' file
```
using sed
```
sed '2,$y/\t/,/' file #skip 1 line header and translate (same as tr)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
孤城傲影

2020-12-04 08:36

I think it is better not to cat the file because it may create problem in the case of large file. The better way may be

$ tr ',' '\t' < csvfile.csv > tabdelimitedFile.txt

The command will get input from csvfile.csv and store the result as tab seperated in tabdelimitedFile.txt

0 讨论(0)
发布评论:

提交评论
- 加载中...
日久生厌

2020-12-04 08:37
- If you want to convert the whole tsv file into a csv file:
```
$ cat data.tsv | tr "\\t" "," > data.csv
```
- If you want to omit some fields:
```
$ cat data.tsv | cut -f1,2,3 | tr "\\t" "," > data.csv
```
  The above command will convert the data.tsv file to data.csv file containing only the first three fields.
0 讨论(0)
发布评论:

提交评论
- 加载中...

青春惊慌失措

2020-12-04 08:37

the following awk oneliner supports quoting + quote-escaping

printf "flop\tflap\"" | awk -F '\t' '{ gsub(/"/,"\"\"\"",$i); for(i = 1; i <= NF; i++) { printf "\"%s\"",$i; if( i < NF ) printf "," }; printf "\n" }'

gives

"flop","flap""""

0 讨论(0)

上一页 1 2