import from CSV into Ruby array, with 1st field as hash key, then lookup a field's value given header row

前端 未结 6 1285
萌比男神i
萌比男神i 2021-01-31 04:17

Maybe somebody can help me.

Starting with a CSV file like so:

Ticker,\"Price\",\"Market Cap\"
ZUMZ,30.00,933.90
XTEX,16.02,811.57
AAC,9.83,80.02


        
相关标签:
6条回答
  • 2021-01-31 04:26

    Like this (it works with other CSVs too, not just the one you specified):

    require 'csv'
    
    tickers = {}
    
    CSV.foreach("stocks.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
      tickers[row.fields[0]] = Hash[row.headers[1..-1].zip(row.fields[1..-1])]
    end
    

    Result:

    {"ZUMZ"=>{:price=>30.0, :market_cap=>933.9}, "XTEX"=>{:price=>16.02, :market_cap=>811.57}, "AAC"=>{:price=>9.83, :market_cap=>80.02}}
    

    You can access elements in this data structure like this:

    puts tickers["XTEX"][:price] #=> 16.02
    

    Edit (according to comment): For selecting elements, you can do something like

     tickers.select { |ticker, vals| vals[:price] > 10.0 }
    
    0 讨论(0)
  • 2021-01-31 04:30

    While this isn't a 100% native Ruby solution to the original question, should others stumble here and wonder what awk call I wound up using for now, here it is:

    $dividend_yield = IO.readlines("|awk -F, '$1==\"#{$stock}\" {print $9}' datafile.csv")[0].to_f
    

    where $stock is the variable I had previously assigned to a company's ticker symbol (the wannabe key field). Conveniently survives problems by returning 0.0 if: ticker or file or field #9 not found/empty, or if value cannot be typecasted to a float. So any trailing '%' in my case gets nicely truncated.

    Note that at this point one could easily add more filters within awk to have IO.readlines return a 1-dim array of output lines from the smaller resulting CSV, eg.

     awk -F, '$9 >= 2.01  &&  $2 > 99.99  {print $0}' datafile.csv 
    

    outputs in bash which lines have a DivYld (col 9) over 2.01 and price (col 2) over 99.99. (Unfortunately I'm not using the header row to to determine field numbers, which is where I was ultimately hoping for some searchable associative Ruby array.)

    0 讨论(0)
  • 2021-01-31 04:40

    Not as 1-liner-ie but this was more clear to me.

    csv_headers = CSV.parse(STDIN.gets)
    csv = CSV.new(STDIN)
    
    kick_list = []
    csv.each_with_index do |row, i|
      row_hash = {}
      row.each_with_index do |field, j|
        row_hash[csv_headers[0][j]] = field
      end
      kick_list << row_hash
    end
    
    0 讨论(0)
  • 2021-01-31 04:41
    CSV.read(file_path, headers:true, header_converters: :symbol, converters: :all).collect do |row|
      Hash[row.collect { |c,r| [c,r] }]
    end
    
    0 讨论(0)
  • 2021-01-31 04:43

    To add on to Michael Kohl's answer, if you want to access the elements in the following manner

    puts tickers[:price]["XTEX"] #=> 16.02
    

    You can try the following code snippet:

    CSV.foreach("Workbook1.csv", :headers => true, :header_converters => :symbol, :converters => :all) do |row|
        hash_row =  row.headers[1..-1].zip( (Array.new(row.fields.length-1, row.fields[0]).zip(row.fields[1..-1])) ).to_h
        hash_row.each{|key, value| tickers[key] ? tickers[key].merge!([value].to_h) : tickers[key] = [value].to_h}
    end
    
    0 讨论(0)
  • 2021-01-31 04:43

    To get the best of both worlds (very fast reading from a huge file AND the benefits of a native Ruby CSV object) my code had since evolved into this method:

    $stock="XTEX"
    csv_data = CSV.parse IO.read(%`|sed -n "1p; /^#{$stock},/p" stocks.csv`), {:headers => true, :return_headers => false, :header_converters => :symbol, :converters => :all}
    
    # Now the 1-row CSV object is ready for use, eg:
    $company = csv_data[:company][0]
    $volatility_month = csv_data[:volatility_month][0].to_f
    $sector = csv_data[:sector][0]
    $industry = csv_data[:industry][0]
    $rsi14d = csv_data[:relative_strength_index_14][0].to_f
    

    which is closer to my original method, but only reads in one record plus line 1 of the input csv file containing the headers. The inline sed instructions take care of that--and the whole thing is noticably instant. This this is better than last because now I can access all the fields from Ruby, and associatively, not caring about column numbers anymore as was the case with awk.

    0 讨论(0)
提交回复
热议问题