问题
I'm importing xls
files using gdata
. I am converting date columns using as.Date
to convert the date
As per the manual for as.Date
, the date origin is platform dependent, and so I am determining which origin to use accordingly
.origin <- ifelse(Sys.info()[['sysname']] == "Windows", "1899-12-30", "1904-01-01")
as.Date(myData$Date, origin=.origin)
However, I'm wondering if I should be considering the platform where the file is being read or the platform where it was written?
For what it's worth, I am currently testing the code on a linux box with no excel, and the correct Dates are produced by using origin="1904-01-01"
Quoting `?as.Date'
## date given as number of days since 1900-01-01 (a date in 1989)
as.Date(32768, origin = "1900-01-01")
## Excel is said to use 1900-01-01 as day 1 (Windows default) or
## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
## treating 1900 as a leap year.
## So for dates (post-1901) from Windows Excel
as.Date(35981, origin = "1899-12-30") # 1998-07-05
## and Mac Excel
as.Date(34519, origin = "1904-01-01") # 1998-07-05
## (these values come from http://support.microsoft.com/kb/214330)
回答1:
You could try out the (extremely) new exell package: https://github.com/hadley/exell. It loads excel dates into POSIXct, correctly choosing the origin based on whether the file was written by Windows or Mac Excel.
回答2:
Yes, you should be considering where the file was written. Excel-Windows appears capable of distinguishing Mac-written dates from Win-written dates, but you are getting evidence that these are Mac-originated .xls files.
The safest method would be to work within the version of Excel on which the data was entered and to use the format menu to bring up a dialog box from which you choose as-Date and a custom format of yyyy-mm-dd. Then save as a csv file and you will be able to import into R with the colClasses vector "Date" in the proper column position. But that sounds as though it is an option not available.
I suppose it doesn't apply to you on a linux box so this is just a Mac-whine: The gdata-package gives deprecation warnings and then fails to install the XLSX support files on R 3.0.0 with the ordinary Perl 5.8 installation in '/opt/local/bin/perl'. This despite 'gdata::findPerl` being able to find it successfully.
At this point I think the question should be redirected to inquiring whether you could coax gdata functions into inspecting the properties of the files. After looking at the codebase for xls reading, I rather doubt it, since do not see any mention of inspecting for different xls versions.
Near the end of a blank xls file created with a Mac version of Excel, looking with a text editor I see:
Worksheets˛ˇˇˇˇˇ ¿F$Microsoft Excel 97 - 2004 Worksheet˛ˇˇˇ8FIBExcel.Sheet.8˛ˇ
‡ÖüÚ˘Oh´ë+'≥Ÿ0îHPhħ
∞ºƒ'David WinsemiusDavid WinsemiusMicrosoft Macintosh Excel@ê˚á!Ë+Œ@ê'å-Ë+ŒG»˛ˇˇˇPICT¿Kġ
The other difference was that the Windows version inspected the same way was had "Excel 2003 Worksheet" as teh type of worksheet, whereas it was "Excel 97 - 2004" for the Mac version. So maybe you can coerce R into bypassing all the errors that get triggered when reading or grepping during scanning for "Macintosh". Maybe Linux-R is more resistant to that sort of thing?
Error: invalid multibyte string at '<ff>'
I also got a bunch of warnings from grep that suggested I might not be able to "see" into some of the strings:
Warning message:
In grep("Macintosh", lin) : input string 1 is invalid in this locale
You might be able to highjack some more robust code from the Perl code in xls2csv.pl which is part of the gdata package.
来源:https://stackoverflow.com/questions/15686451/dates-from-excel-to-r-platform-dependency