Bash: Parse CSV with quotes, commas and newlines

前端 未结 7 1604
隐瞒了意图╮
隐瞒了意图╮ 2020-12-11 16:10

Say I have the following csv file:

 id,message,time
 123,\"Sorry, This message
 has commas and newlines\",2016-03-28T20:26:39
 456,\"It makes the problem non         


        
相关标签:
7条回答
  • 2020-12-11 16:24

    As chepner said, you are encouraged to use a programming language which is able to parse csv.

    Here comes an example in python:

    import csv
    
    with open('a.csv', 'rb') as csvfile:
        reader = csv.reader(csvfile, quotechar='"')
        for row in reader:
            print(row[-1]) # row[-1] gives the last column
    
    0 讨论(0)
  • 2020-12-11 16:25
    awk -F, '!/This/{print $NF}' file
    
    time
    2016-03-28T20:26:39
    2016-03-28T20:26:41
    
    0 讨论(0)
  • 2020-12-11 16:30

    another awk alternative using FS

    $ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}
                    NR>1{sub(/,/,"",$NF); print $NF}' file
    
    2016-03-28T20:26:39
    2016-03-28T20:26:41
    
    0 讨论(0)
  • 2020-12-11 16:39

    As said here

    gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file.csv \
     | awk -F, '{print $NF}'
    

    To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

    gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file
    

    This works by splitting the file along " characters and removing newlines in every other block.

    Output

    time
    2016-03-28T20:26:39
    2016-03-28T20:26:41
    

    Then use awk to split the columns and display the last column

    0 讨论(0)
  • 2020-12-11 16:39

    CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the csv module instead of plain BASH.

    If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.

    See also:

    • https://unix.stackexchange.com/questions/7425/is-there-a-robust-command-line-tool-for-processing-csv-files
    0 讨论(0)
  • 2020-12-11 16:43
    sed -e 's/,/\n/g' file.csv | egrep ^201[0-9]-
    
    0 讨论(0)
提交回复
热议问题