How do I split apart a CSV string in Ruby?

后端 未结 6 715
醉话见心
醉话见心 2021-01-06 13:43

I have this line as an example from a CSV file:

2412,21,\"Which of the following is not found in all cells?\",\"Curriculum\",\"Life and Living Processes, Lif         


        
相关标签:
6条回答
  • 2021-01-06 14:15

    This is not a suitable task for regular expressions. You need a CSV parser, and Ruby has one built in:

    http://ruby-doc.org/stdlib/libdoc/csv/rdoc/classes/CSV.html

    And an arguably superior 3rd part library:

    http://fastercsv.rubyforge.org/

    0 讨论(0)
  • 2021-01-06 14:19

    EDIT: I failed to read the Ruby tag. The good news is, the guide will explain the theory behind building this, even if the language specifics aren't right. Sorry.

    Here is a fantastic guide to doing this:

    http://knab.ws/blog/index.php?/archives/10-CSV-file-parser-and-writer-in-C-Part-2.html

    and the csv writer is here:

    http://knab.ws/blog/index.php?/archives/3-CSV-file-parser-and-writer-in-C-Part-1.html

    These examples cover the case of having a quoted literal in a csv (which may or may not contain a comma).

    0 讨论(0)
  • 2021-01-06 14:20

    My preference is @steenstag's solution, but an alternative is to use String#scan with the following regular expression.

    r = /(?<![^,])(?:(?!")[^,\n]*(?<!")|"[^"\n]*")(?![^,])/
    

    If the variable str holds the string given in the example, we obtain:

    puts str.scan r
    

    displays

    2412
    21
    "Which of the following is not found in all cells?"
    "Curriculum"
    "Life and Living Processes, Life Processes"
    
    
    1
    0
    "endofline"
    

    Start your engine!

    See also regex101 which provides a detailed explanation of each token of the regex. (Move your cursor across the regex.)

    Ruby's regex engine performs the following operations.

    (?<![^,]) : negative lookbehind assert current location is not preceded
                by a character other than a comma
    (?:       : begin non-capture group
      (?!")   : negative lookahead asserts next char is not a double-quote
      [^,\n]* : match 0+ chars other than a comma and newline
      (?<!")  : negative lookbehind asserts preceding character is not a
                double-quote
      |       : or
      "       : match double-quote
      [^"\n]* : match 0+ chars other than double-quote and newline
      "       : match double-quote
    )         : end of non-capture group
    (?![^,])  : negative lookahead asserts current location is not followed
                by a character other than a comma
    

    Note that (?<![^,]) is the same as (?<=,|^) and (?![^,]) is the same as (?=^|,).

    0 讨论(0)
  • 2021-01-06 14:27

    This morning I stumbled across a CSV Table Importer project for Ruby-on-Rails. Eventually you will find the code helpful:

    Github TableImporter

    0 讨论(0)
  • 2021-01-06 14:32
    str=<<EOF
    2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
    EOF
    require 'csv' # built in
    
    p CSV.parse(str)
    # That's it! However, empty fields appear as nil.
    # Makes sense to me, but if you insist on empty strings then do something like:
    parser = CSV.new(str)
    parser.convert{|field| field.nil? ? "" : field}
    p parser.readlines
    
    0 讨论(0)
  • 2021-01-06 14:32
    text=<<EOF
    2412,21,"Which of the following is not found in all cells?","Curriculum","Life and Living Processes, Life Processes",,,1,0,"endofline"
    EOF
    x=[]
    text.chomp.split("\042").each_with_index do |y,i|
      i%2==0 ?  x<< y.split(",") : x<<y
    end
    print x.flatten
    

    output

    $ ruby test.rb
    ["2412", "21", "Which of the following is not found in all cells?", "Curriculum", "Life and Living Processes, Life Processes", "", "", "", "1", "0", "endofline"]
    
    0 讨论(0)
提交回复
热议问题