问题
To avoid "duplicate" close request: I know how to read Excel named ranges; examples are given in the code below. This is about "real" tables in Excel.
Excel2007 and later have the useful concept of tables: you can convert ranges to tables, and avoid hassles when sorting and rearranging. When you create a table in an Excel range, it gets a default name (Tabelle1
in German Version, TableName
in the following example), but you can additionally simply name the range of the table (TableAsRangeName
); as indicated by the icons in the Excel range name editor, these two seem to be treated differently.
I have not been able to read these tables (in the strict sense) from R. The only known workaround is using CSV intermediate, or converting the table to a normal named range, which has nasty irreversible side effects when you use column names in cell references; these are converted to A1 notation.
The example below shows the problem. You mileage may vary with different combinations of 32/64 bit ODBC drivers and 32/64 bit Java
# Read Excel Tables (not simply named ranges)
# Test Computer: 64 Bit Windows 7, R 32 bit
# My ODBC drivers are 32 bit
library(RODBC)
# Test file has three ranges
# NonTable Simple named range
# TableName Name of table
# TableAsRangeName Named Range covering the above table
sampleFile = "ExcelTables.xlsx"
if (!file.exists(sampleFile)){
download.file("http://www.menne-biomed.de/uni/ExcelTables.xlsx",sampleFile)
# Or do it manually, if this fails
}
# ODBC
channel = odbcConnectExcel2007(sampleFile)
sqlQuery(channel, "SELECT * from NonTable") # Ok
sqlQuery(channel, "SELECT * from TableName") # Could not find range
sqlQuery(channel, "SELECT * from TableAsRangeName") # Could not find range
close(channel)
# gdata has read.xls, but seems not to support named regions
library(xlsx)
wb = loadWorkbook(sampleFile)
getRanges(wb) # This one fails already with "TableName" does not exist
ws = getSheets(wb)[[1]]
readRange("NonTable",ws) # Invalid range address
readRange("TableName",ws) # Invalid range address
readRange("TableAsRangeName",ws) # Invalid range address
# my machine requires 64 bit for this one; depends on your Java installation
sampleFile = "ExcelTables.xlsx"
library(XLConnect) # requires Java
readNamedRegionFromFile(sampleFile,"NonTable") # OK
readNamedRegionFromFile(sampleFile,"TableName") # "TableName" does not exist
readNamedRegionFromFile(sampleFile,"TableAsRangeName") # NullPointerException
wb <- loadWorkbook(sampleFile)
readNamedRegion(wb,"NonTable") # Ok
readNamedRegion(wb,"TableName") # does not exist
readNamedRegion(wb,"TableAsRangeName") # Null Pointer
回答1:
I've added some initial support for Excel tables in XLConnect. Please find the latest changes on github at https://github.com/miraisolutions/xlconnect
In the following a small sample:
require(XLConnect)
sampleFile = "ExcelTables.xlsx"
wb = loadWorkbook(sampleFile)
readTable(wb, sheet = "ExcelTable", table = "TableName")
Note that Excel tables are associated to a sheet. So as far as I can see it's possible to have multiple tables with the same name associated to different sheets. For this reason there is a sheet
-argument to readTable
.
回答2:
You are correct that the table definitions are stored in XML.
sampleFile = "ExcelTables.xlsx"
unzip(sampleFile, exdir = 'test')
library(XML)
tData <- xmlParse('test/xl/tables/table1.xml')
tables <- xpathApply(tData, "//*[local-name() = 'table']", xmlAttrs)
[[1]]
id name displayName ref totalsRowShown
"1" "TableName" "TableName" "G1:I4" "0"
library(XLConnect)
readWorksheetFromFile(sampleFile, sheet = "ExcelTable", region = tables[[1]]['ref'], header = TRUE)
Name Age AgeGroup
1 Anton 44 4
2 Bertha 33 3
3 Cäsar 21 2
Depending on your situation you could search in the XML files for appropriate quantities.
回答3:
Later addition:
readxl::readxl
can read "real" tables and probably is the least troublesome solution when you want to read data frames/tibbles.
** After @Jamzy comment ** I tried again and could not read named ranges. False positive then or false negative now???
来源:https://stackoverflow.com/questions/17671290/read-excel-tables-not-simple-named-ranges