How do I parse an Excel file that will give me data exactly as it appears visually?

狂风中的少年 提交于 2019-12-08 17:35:55

问题


I'm on Rails 5 (Ruby 2.4). I want to read an .xls doc and I would like to get the data into CSV format, just as it appears in the Excel file. Someone recommended I use Roo, and so I have

book = Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
text = sheet.to_csv
arr_of_arrs = CSV.parse(text)

However what is getting returned is not the same as what I see in the spreadsheet. For isntance, a cell in the spreadsheet has

16:45.81

and when I get the CSV data from above, what is returned is

"0.011641319444444444"

How do I parse the Excel doc and get exactly what I see? I don't care if I use Roo to parse or not, just as long as I can get CSV data that is a representation of what I see rather than some weird internal representation. For reference the file type I was parsing givies this when I run "file name_of_file.xls" ...

Composite Document File V2 Document, Little Endian, Os: Windows, Version 5.1, Code page: 1252, Author: Dwight Schroot, Last Saved By: Dwight Schroot, Name of Creating Application: Microsoft Excel, Create Time/Date: Tue Sep 21 17:05:21 2010, Last Saved Time/Date: Wed Oct 13 16:52:14 2010, Security: 0

回答1:


You need to save the custom formula in a text format on the .xls side. If your opening the .xls file from the internet this won't work but this will fix your problem if you can manipulate the file. You can do this using the function =TEXT(A2, "mm:ss.0") A2 is just the cell I'm using as an example.

book = ::Roo::Spreadsheet.open(file_location)
puts book.cell('B', 2) 
=> '16.45.8' 

If manipulating the file is not an option you could just pass a custom converter to CSV.new() and convert the decimal time back to the correct format you need.

require 'roo-xls'
require 'csv'

CSV::Converters[:time_parser] = lambda do |field, info| 
  case info[:header].strip
  when "time" then  begin 
                      # 0.011641319444444444 * 24 hours * 3600 seconds = 1005.81 
                      parse_time =  field.to_f * 24 * 3600
                      # 1005.81.divmod(60) = [16, 45.809999999999999945]
                      mm, ss = parse_time.divmod(60)
                      # returns "16:45.81"
                      time = "#{mm}:#{ss.round(2)}"  
                      time 
                    rescue
                      field 
                    end
  else 
    field  
  end
end

book = ::Roo::Spreadsheet.open(file_location)
sheet = book.sheet(0)
csv = CSV.new(sheet.to_csv, headers: true, converters: [:time_parser]).map {|row| row.to_hash}
puts csv 
=> {"time "=>"16:45.81"}
   {"time "=>"12:46.0"}



回答2:


Under the hood roo-xls gem uses the spreadsheet gem to parse the xls file. There was a similar issue to yours logged here, but it doesn't appear that there was any real resolution. Internally xls stores 16:45.81 as a Number and associates some formatting with it. I believe the issue has something to do with the spreadsheet gem not correctly handling the cell format.

I did try messing around with adding a format mm:ss.0 by following this guide but I couldn't get it to work, maybe you'll have more luck.




回答3:


You can use converters option. It seems looking like this:

arr_of_arrs = CSV.parse(text, {converters: :date_time})

http://ruby-doc.org/stdlib-2.0.0/libdoc/csv/rdoc/CSV.html




回答4:


Your problem seems to be with the way you're parsing (reading) the input file.

roo parses only Excel 2007-2013 (.xlsx) files. From you question, you want to parse .xls, which is a different format.

Like the documentation says, use the roo-xls gem instead.



来源:https://stackoverflow.com/questions/43077897/how-do-i-parse-an-excel-file-that-will-give-me-data-exactly-as-it-appears-visual

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!