how to cut columns of csv

后端 未结 4 1014
不知归路 2021-02-07 03:22

I have a set of csv files (around 250), each having 300 to 500 records. I need to cut 2 or 3 columns from each file and store it to another one. I\'m using ubuntu OS

  • 2021-02-07 03:23

    If you used ssconvert to get the CSV you might try:

    ssconvert -O 'separator="|"' "file.xls" "file.txt"

    Notice the TXT extension instead CSV, this way will use Gnumeric_stf:stf_assistant exporter instead of Gnumeric_stf:stf_csv, which let you use options (-O parameter). Otherwise you'll get a The file saver does not take options error. Pipe character is much more unlikely, but you might want to check before.

    Then you can rename it and do things like:

    cat file.csv | cut -d "|" -f3 | sort | uniq -c | sort -rn | head
    • Other options example: -O 'eol=unix separator=; format=preserve charset=UTF-8 locale=en_US transliterate-mode=transliterate quoting-mode=never'.
    • A solution with AWK v4+.
    • ssconvert man page.
    0 讨论(0)
  • 2021-02-07 03:29

    If you know that the column delimiter does not occur inside the fields, you can use cut.

    $ cat in.csv
    $ cut -d, -f2,3 < in.csv 

    You can use the shell buildin 'for' to loop over all input files.

    0 讨论(0)
  • 2021-02-07 03:36

    If your fields contain commas or newlines, you can use a helper program I wrote to allow cut (and other UNIX text processing tools) to properly work with the data.

    This program finds special characters inside quoted fields, and temporarily replaces them with nonprinting characters which won't confuse the cut program. Then they get restored after cut is done.

    lutz' solution would become:

    csvquote in.csv | cut -d, -f2,3 | csvquote -u 
    0 讨论(0)
  • 2021-02-07 03:41

    If the fields might contain the delimiter, you ought to find a library that can parse CSV files. Typically, general purpose scripting languages will include a CSV module in their standard library.

    Ruby:   require 'csv'
    Python: import csv
    Perl:   use Text::ParseWords;
    0 讨论(0)