Read Excel Tables, not simple named ranges

徘徊边缘 提交于 2021-01-26 11:31:42

问题


To avoid "duplicate" close request: I know how to read Excel named ranges; examples are given in the code below. This is about "real" tables in Excel.

Excel2007 and later have the useful concept of tables: you can convert ranges to tables, and avoid hassles when sorting and rearranging. When you create a table in an Excel range, it gets a default name (Tabelle1 in German Version, TableName in the following example), but you can additionally simply name the range of the table (TableAsRangeName); as indicated by the icons in the Excel range name editor, these two seem to be treated differently.

I have not been able to read these tables (in the strict sense) from R. The only known workaround is using CSV intermediate, or converting the table to a normal named range, which has nasty irreversible side effects when you use column names in cell references; these are converted to A1 notation.

The example below shows the problem. You mileage may vary with different combinations of 32/64 bit ODBC drivers and 32/64 bit Java

# Read Excel Tables (not simply named ranges)
# Test Computer: 64 Bit Windows 7, R 32 bit  
# My ODBC drivers are 32 bit
library(RODBC)
# Test file has three ranges
# NonTable Simple named range
# TableName Name of table 
# TableAsRangeName Named Range covering the above table
sampleFile = "ExcelTables.xlsx"
if (!file.exists(sampleFile)){
  download.file("http://www.menne-biomed.de/uni/ExcelTables.xlsx",sampleFile)
  # Or do it manually, if this fails
}
# ODBC
channel = odbcConnectExcel2007(sampleFile)
sqlQuery(channel, "SELECT * from NonTable") # Ok
sqlQuery(channel, "SELECT * from TableName") # Could not find range
sqlQuery(channel, "SELECT * from TableAsRangeName") # Could not find range
close(channel)

# gdata has read.xls, but seems not to support named regions

library(xlsx)
wb = loadWorkbook(sampleFile)
getRanges(wb) # This one fails already with "TableName" does not exist
ws = getSheets(wb)[[1]]
readRange("NonTable",ws) # Invalid range address
readRange("TableName",ws) # Invalid range address
readRange("TableAsRangeName",ws) # Invalid range address

# my machine requires 64 bit for this one; depends on your Java installation
sampleFile = "ExcelTables.xlsx"
library(XLConnect) # requires Java
readNamedRegionFromFile(sampleFile,"NonTable") # OK
readNamedRegionFromFile(sampleFile,"TableName") # "TableName" does not exist
readNamedRegionFromFile(sampleFile,"TableAsRangeName") # NullPointerException

wb <- loadWorkbook(sampleFile)
readNamedRegion(wb,"NonTable") # Ok
readNamedRegion(wb,"TableName") # does not exist
readNamedRegion(wb,"TableAsRangeName") # Null Pointer

回答1:


I've added some initial support for Excel tables in XLConnect. Please find the latest changes on github at https://github.com/miraisolutions/xlconnect

In the following a small sample:

require(XLConnect)
sampleFile = "ExcelTables.xlsx"
wb = loadWorkbook(sampleFile)
readTable(wb, sheet = "ExcelTable", table = "TableName")

Note that Excel tables are associated to a sheet. So as far as I can see it's possible to have multiple tables with the same name associated to different sheets. For this reason there is a sheet-argument to readTable.




回答2:


You are correct that the table definitions are stored in XML.

sampleFile = "ExcelTables.xlsx"
unzip(sampleFile, exdir = 'test')
library(XML)
tData <- xmlParse('test/xl/tables/table1.xml')
tables <- xpathApply(tData, "//*[local-name() = 'table']", xmlAttrs)
[[1]]
            id           name    displayName            ref totalsRowShown 
           "1"    "TableName"    "TableName"        "G1:I4"            "0" 
library(XLConnect)

readWorksheetFromFile(sampleFile, sheet = "ExcelTable", region = tables[[1]]['ref'], header = TRUE)
    Name Age AgeGroup
1  Anton  44        4
2 Bertha  33        3
3  Cäsar  21        2

Depending on your situation you could search in the XML files for appropriate quantities.




回答3:


Later addition:

readxl::readxl can read "real" tables and probably is the least troublesome solution when you want to read data frames/tibbles.

** After @Jamzy comment ** I tried again and could not read named ranges. False positive then or false negative now???



来源:https://stackoverflow.com/questions/17671290/read-excel-tables-not-simple-named-ranges

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!