Dates from Excel to R, platform dependency

人走茶凉 提交于 2019-11-28 19:19:11

问题


I'm importing xls files using gdata. I am converting date columns using as.Date to convert the date

As per the manual for as.Date, the date origin is platform dependent, and so I am determining which origin to use accordingly

.origin <- ifelse(Sys.info()[['sysname']] == "Windows", "1899-12-30", "1904-01-01")
as.Date(myData$Date, origin=.origin)

However, I'm wondering if I should be considering the platform where the file is being read or the platform where it was written?

For what it's worth, I am currently testing the code on a linux box with no excel, and the correct Dates are produced by using origin="1904-01-01"


Quoting `?as.Date'

  ## date given as number of days since 1900-01-01 (a date in 1989)
  as.Date(32768, origin = "1900-01-01")
  ## Excel is said to use 1900-01-01 as day 1 (Windows default) or
  ## 1904-01-01 as day 0 (Mac default), but this is complicated by Excel
  ## treating 1900 as a leap year.
  ## So for dates (post-1901) from Windows Excel
  as.Date(35981, origin = "1899-12-30") # 1998-07-05
  ## and Mac Excel
  as.Date(34519, origin = "1904-01-01") # 1998-07-05
  ## (these values come from http://support.microsoft.com/kb/214330)


回答1:


You could try out the (extremely) new exell package: https://github.com/hadley/exell. It loads excel dates into POSIXct, correctly choosing the origin based on whether the file was written by Windows or Mac Excel.




回答2:


Yes, you should be considering where the file was written. Excel-Windows appears capable of distinguishing Mac-written dates from Win-written dates, but you are getting evidence that these are Mac-originated .xls files.

The safest method would be to work within the version of Excel on which the data was entered and to use the format menu to bring up a dialog box from which you choose as-Date and a custom format of yyyy-mm-dd. Then save as a csv file and you will be able to import into R with the colClasses vector "Date" in the proper column position. But that sounds as though it is an option not available.

I suppose it doesn't apply to you on a linux box so this is just a Mac-whine: The gdata-package gives deprecation warnings and then fails to install the XLSX support files on R 3.0.0 with the ordinary Perl 5.8 installation in '/opt/local/bin/perl'. This despite 'gdata::findPerl` being able to find it successfully.

At this point I think the question should be redirected to inquiring whether you could coax gdata functions into inspecting the properties of the files. After looking at the codebase for xls reading, I rather doubt it, since do not see any mention of inspecting for different xls versions.

Near the end of a blank xls file created with a Mac version of Excel, looking with a text editor I see:

Worksheets˛ˇˇˇˇˇ ¿F$Microsoft Excel 97 - 2004 Worksheet˛ˇˇˇ8FIBExcel.Sheet.8˛ˇ
‡ÖüÚ˘Oh´ë+'≥Ÿ0îHPhħ
∞ºƒ'David WinsemiusDavid WinsemiusMicrosoft Macintosh Excel@ê˚á!Ë+Œ@ê'å-Ë+ŒG»˛ˇˇˇPICT¿Kġ

The other difference was that the Windows version inspected the same way was had "Excel 2003 Worksheet" as teh type of worksheet, whereas it was "Excel 97 - 2004" for the Mac version. So maybe you can coerce R into bypassing all the errors that get triggered when reading or grepping during scanning for "Macintosh". Maybe Linux-R is more resistant to that sort of thing?

Error: invalid multibyte string at '<ff>'

I also got a bunch of warnings from grep that suggested I might not be able to "see" into some of the strings:

Warning message:
In grep("Macintosh", lin) : input string 1 is invalid in this locale

You might be able to highjack some more robust code from the Perl code in xls2csv.pl which is part of the gdata package.



来源:https://stackoverflow.com/questions/15686451/dates-from-excel-to-r-platform-dependency

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!