问题
Trying to parse and XLSX file using roo gem in a ruby script.
In excel dates are stored as floats or integers in the format DDDDD.ttttt, counting from 1900-01-00 (00 no 01)
. So in order to convert a date such as 40396 - you would take 1900-01-00 + 40396
and you should get 2010-10-15, but I'm getting 2010-08-08.
I'm using active_support/time to do calculation like so:
Time.new("1900-01-01") + 40396.days
Am I doing my calculation wrong or is there a bug in active support?
I'm running ruby 1.9.3-mri on Windows 7 + latest active_support gem (3.2.1)
EDIT
I was looking at the older file in Excel with the wrong data - my script / console were pulling the right data - hence my confusion - I was doing everything right, except for using the right file!!!! Damn the all-nighters!
Thanks to everyone replying, I will keep the question here in case somebody needs info on how to convert dates from excel using ruby.
Also for anyone else running into this - spreadsheet gem DOES NOT support reading XLSX files at this point (v 0.7.1) properly - so I'm using roo for reading, and axlsx for writing.
回答1:
You have an off-by-one error in your day numbering - due to a bug in Lotus 1-2-3 that Excel and other spreadsheet programs have carefully maintained compatibility with for 30+ years.
Originally, day 1 was intended to be January 1, 1900 (which would, as you stated, make day 0 equal to December 31, 1899). But Lotus incorrectly considered 1900 to be a leap year, so if you use the Lotus numbers for the present and count backwards, correctly making 1900 a common year, the day numbers for everything before March 1st, 1900, are one too high. Day 1 becomes December 31st, 1899, and day 0 shifts back to the 30th. So the epoch for date arithmetic in Lotus-based spreadsheets is really Saturday, December 30th, 1899. (Modern Excel and some other spreadsheets extend the Lotus bug-compatibility far enough to show February 1900 actually having a 29th day, so they will label day 0 "December 31st" while agreeing that it was a Saturday! But other Lotus-based spreadsheets don't do that, and Ruby certainly doesn't either.)
Even allowing for this error, however, your stated example is incorrect: Lotus day number 40,396 is August 6th, 2010, not October 15th. I have confirmed this correspondence in Excel, LibreOffice, and Google sheets, all of which agree. You must have crossed examples somewhere.
Here's one way to do the conversion:
Time.utc(1899,12,30) + 40396.days #=> 2010-08-06 00:00:00 UTC
Alternatively, you could take advantage of another known correspondence. Time zero for Ruby (and POSIX systems in general) is the moment January 1, 1970, at midnight GMT. January 1, 1970 is Lotus day 25,569. As long as you remember to do your calculations in UTC, you can also do this:
Time.at( (40396 - 25569).days ).utc # => 2010-08-06 00:00:00 UTC
In either case, you probably want to declare a symbolic constant for the epoch date (either the Time
object representing 1899-12-30 or the POSIX "day 0" value 25,569).
You can replace those calls to .days
with multiplication by 86400 (seconds per day) if you don't need active_support/core_ext/integer/time
for anything else, and don't want to load it just for this.
回答2:
"Excel stores dates and times as a number representing the number of days since 1900-Jan-0, plus a fractional portion of a 24 hour day: ddddd.tttttt . This is called a serial date, or serial date-time." (http://www.cpearson.com/excel/datetime.htm)
If your column contains a date time, rather then just a date, the following code is useful:
dt = DateTime.new(1899, 12, 30) + excel_value.to_f
Also keep in mind that there are 2 modes of dates in an excel worksheet, 1900 based and 1904 based, which typically is enabled by default for spreadsheets created on the mac. If you consistently find your dates off by 4 years, you should use a different base date:
dt = DateTime.new(1904, 1, 1) + excel_value.to_f
You can enable/disable 1904 date mode for any spreadsheet, but the dates will then appear off by 4 years in the spreadsheet if you change the setting after adding data. In general you should always use 1900 date mode since most excel users in the wild are windows based.
Note: A gotcha with this method is that rounding might occur +/- 1 second. For me the dates I import are "close enough" but just something to keep in mind. A better solution might use rounding on fractional seconds to solve this issue.
回答3:
You're doing your calculation wrong. How do you arrive at the expected result of 2010-10-15?
In Excel, 40396
is 2010-08-06
(not using the 1904 calendar, of course). To demonstrate that, type 40396 into an Excel cell and set the format to yyyy-mm-dd
.
Alternatively:
40396 / 365.2422 = 110.6 (years -- 1900 + 110 = 2010)
0.6 * 12 = 7.2 (months -- January = 1; 1 + 7 = 8; 8 = August)
0.2 * 30 = 6 (days)
Excel's calendar incorrectly includes 1900-02-29; that accounts for one day's difference between your 2010-08-08 result; I'm not sure about the reason for the second day of difference.
来源:https://stackoverflow.com/questions/10559767/how-to-convert-ms-excel-date-from-float-to-date-format-in-ruby