fastest way convert tab-delimited file to csv in linux

前端未结

关注

 11  1257

I have a tab-delimited file that has over 200 million lines. What\'s the fastest way in linux to convert this to a csv file? This file does have multiple lines of header i

相关标签:

11条回答

醉梦人生

2020-12-04 08:21
If you're worried about embedded commas then you'll need to use a slightly more intelligent method. Here's a Python script that takes TSV lines from stdin and writes CSV lines to stdout:
```
import sys
import csv

tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tabin:
  commaout.writerow(row)
```
Run it from a shell as follows:
```
python script.py < input.tsv > output.csv
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
清酒与你

2020-12-04 08:25
```
sed -e 's/"/\\"/g' -e 's/<tab>/","/g' -e 's/^/"/' -e 's/$/"/' infile > outfile
```
Damn the critics, quote everything, CSV doesn't care.

<tab> is the actual tab character. \t didn't work for me. In bash, use ^V to enter it.
0 讨论(0)
发布评论:

提交评论
- 加载中...
没有蜡笔的小新

2020-12-04 08:26
If all you need to do is translate all tab characters to comma characters, tr is probably the way to go.

The blank space here is a literal tab:
```
$ echo "hello   world" | tr "\\t" ","
hello,world
```
Of course, if you have embedded tabs inside string literals in the file, this will incorrectly translate those as well; but embedded literal tabs would be fairly uncommon.
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-12-04 08:28
```
perl -lpe 's/"/""/g; s/^|$/"/g; s/\t/","/g' < input.tab > output.csv
```
Perl is generally faster at this sort of thing than the sed, awk, and Python.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一生所求

2020-12-04 08:32
You can also use xsv for this
```
xsv input -d '\t' input.tsv > output.csv
```
In my test on a 300MB tsv file, it was roughly 5x faster than the python solution (2.5s vs. 14s).
0 讨论(0)
发布评论:

提交评论
- 加载中...
灰色年华

2020-12-04 08:32

right click file, click rename, delete the 't' and put a 'c'. I'm actually not joking, most csv parsers can handle tab delimiters. I had this issue now and for my purposes renaming worked just fine.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页