fastest way convert tab-delimited file to csv in linux

前端 未结 11 1258
感情败类
感情败类 2020-12-04 07:56

I have a tab-delimited file that has over 200 million lines. What\'s the fastest way in linux to convert this to a csv file? This file does have multiple lines of header i

相关标签:
11条回答
  • 2020-12-04 08:35

    @ignacio-vazquez-abrams 's python solution is great! For people who are looking to parse delimiters other tab, the library actually allows you to set arbitrary delimiter. Here is my modified version to handle pipe-delimited files:

    import sys
    import csv
    
    pipein = csv.reader(sys.stdin, delimiter='|')
    commaout = csv.writer(sys.stdout, dialect=csv.excel)
    for row in pipein:
      commaout.writerow(row)
    
    0 讨论(0)
  • 2020-12-04 08:35

    assuming you don't want to change header and assuming you don't have embedded tabs

    # cat file
    header  header  header
    one     two     three
    
    $ awk 'NR>1{$1=$1}1' OFS="," file
    header  header  header
    one,two,three
    

    NR>1 skips the first header. you mentioned you know how many lines of header, so use the correct number for your own case. with this, you also do not need to call any other external commands. just one awk command does the job.

    another way if you have blank columns and you care about that.

    awk 'NR>1{gsub("\t",",")}1' file
    

    using sed

    sed '2,$y/\t/,/' file #skip 1 line header and translate (same as tr)
    
    0 讨论(0)
  • 2020-12-04 08:36

    I think it is better not to cat the file because it may create problem in the case of large file. The better way may be

    $ tr ',' '\t' < csvfile.csv > tabdelimitedFile.txt

    The command will get input from csvfile.csv and store the result as tab seperated in tabdelimitedFile.txt

    0 讨论(0)
  • 2020-12-04 08:37
    • If you want to convert the whole tsv file into a csv file:

      $ cat data.tsv | tr "\\t" "," > data.csv
      

    • If you want to omit some fields:

      $ cat data.tsv | cut -f1,2,3 | tr "\\t" "," > data.csv
      

      The above command will convert the data.tsv file to data.csv file containing only the first three fields.

    0 讨论(0)
  • 2020-12-04 08:37

    the following awk oneliner supports quoting + quote-escaping

    printf "flop\tflap\"" | awk -F '\t' '{ gsub(/"/,"\"\"\"",$i); for(i = 1; i <= NF; i++) { printf "\"%s\"",$i; if( i < NF ) printf "," }; printf "\n" }'
    

    gives

    "flop","flap""""
    
    0 讨论(0)
提交回复
热议问题