Read an Excel file directly from a R script

后端 未结 12 772
误落风尘
误落风尘 2020-11-22 13:52

How can I read an Excel file directly into R? Or should I first export the data to a text- or CSV file and import that file into R?

相关标签:
12条回答
  • 2020-11-22 14:32

    EDIT 2015-October: As others have commented here the openxlsx and readxl packages are by far faster than the xlsx package and actually manage to open larger Excel files (>1500 rows & > 120 columns). @MichaelChirico demonstrates that readxl is better when speed is preferred and openxlsx replaces the functionality provided by the xlsx package. If you are looking for a package to read, write, and modify Excel files in 2015, pick the openxlsx instead of xlsx.

    Pre-2015: I have used xlsxpackage. It changed my workflow with Excel and R. No more annoying pop-ups asking, if I am sure that I want to save my Excel sheet in .txt format. The package also writes Excel files.

    However, I find read.xlsx function slow, when opening large Excel files. read.xlsx2 function is considerably faster, but does not quess the vector class of data.frame columns. You have to use colClasses command to specify desired column classes, if you use read.xlsx2 function. Here is a practical example:

    read.xlsx("filename.xlsx", 1) reads your file and makes the data.frame column classes nearly useful, but is very slow for large data sets. Works also for .xls files.

    read.xlsx2("filename.xlsx", 1) is faster, but you will have to define column classes manually. A shortcut is to run the command twice (see the example below). character specification converts your columns to factors. Use Dateand POSIXct options for time.

    coln <- function(x){y <- rbind(seq(1,ncol(x))); colnames(y) <- colnames(x)
    rownames(y) <- "col.number"; return(y)} # A function to see column numbers
    
    data <- read.xlsx2("filename.xlsx", 1) # Open the file 
    
    coln(data)    # Check the column numbers you want to have as factors
    
    x <- 3 # Say you want columns 1-3 as factors, the rest numeric
    
    data <- read.xlsx2("filename.xlsx", 1, colClasses= c(rep("character", x),
    rep("numeric", ncol(data)-x+1)))
    
    0 讨论(0)
  • 2020-11-22 14:35

    Another solution is the xlsReadWrite package, which doesn't require additional installs but does require you download the additional shlib before you use it the first time by :

    require(xlsReadWrite)
    xls.getshlib()
    

    Forgetting this can cause utter frustration. Been there and all that...

    On a sidenote : You might want to consider converting to a text-based format (eg csv) and read in from there. This for a number of reasons :

    • whatever your solution (RODBC, gdata, xlsReadWrite) some strange things can happen when your data gets converted. Especially dates can be rather cumbersome. The HFWutils package has some tools to deal with EXCEL dates (per @Ben Bolker's comment).

    • if you have large sheets, reading in text files is faster than reading in from EXCEL.

    • for .xls and .xlsx files, different solutions might be necessary. EG the xlsReadWrite package currently does not support .xlsx AFAIK. gdata requires you to install additional perl libraries for .xlsx support. xlsx package can handle extensions of the same name.

    0 讨论(0)
  • 2020-11-22 14:38
    library(RODBC)
    file.name <- "file.xls"
    sheet.name <- "Sheet Name"
    
    ## Connect to Excel File Pull and Format Data
    excel.connect <- odbcConnectExcel(file.name)
    dat <- sqlFetch(excel.connect, sheet.name, na.strings=c("","-"))
    odbcClose(excel.connect)
    

    Personally, I like RODBC and can recommend it.

    0 讨论(0)
  • 2020-11-22 14:40

    As noted above in many of the other answers, there are many good packages that connect to the XLS/X file and get the data in a reasonable way. However, you should be warned that under no circumstances should you use the clipboard (or a .csv) file to retrieve data from Excel. To see why, enter =1/3 into a cell in excel. Now, reduce the number of decimal points visible to you to two. Then copy and paste the data into R. Now save the CSV. You'll notice in both cases Excel has helpfully only kept the data that was visible to you through the interface and you've lost all of the precision in your actual source data.

    0 讨论(0)
  • 2020-11-22 14:47

    Expanding on the answer provided by @Mikko you can use a neat trick to speed things up without having to "know" your column classes ahead of time. Simply use read.xlsx to grab a limited number of records to determine the classes and then followed it up with read.xlsx2

    Example

    # just the first 50 rows should do...
    df.temp <- read.xlsx("filename.xlsx", 1, startRow=1, endRow=50) 
    df.real <- read.xlsx2("filename.xlsx", 1, 
                          colClasses=as.vector(sapply(df.temp, mode)))
    
    0 讨论(0)
  • 2020-11-22 14:48

    And now there is readxl:

    The readxl package makes it easy to get data out of Excel and into R. Compared to the existing packages (e.g. gdata, xlsx, xlsReadWrite etc) readxl has no external dependencies so it's easy to install and use on all operating systems. It is designed to work with tabular data stored in a single sheet.

    readxl is built on top of the libxls C library, which abstracts away many of the complexities of the underlying binary format.

    It supports both the legacy .xls format and .xlsx

    readxl is available from CRAN, or you can install it from github with:

    # install.packages("devtools")
    devtools::install_github("hadley/readxl")
    

    Usage

    library(readxl)
    
    # read_excel reads both xls and xlsx files
    read_excel("my-old-spreadsheet.xls")
    read_excel("my-new-spreadsheet.xlsx")
    
    # Specify sheet with a number or name
    read_excel("my-spreadsheet.xls", sheet = "data")
    read_excel("my-spreadsheet.xls", sheet = 2)
    
    # If NAs are represented by something other than blank cells,
    # set the na argument
    read_excel("my-spreadsheet.xls", na = "NA")
    

    Note that while the description says 'no external dependencies', it does require the Rcpp package, which in turn requires Rtools (for Windows) or Xcode (for OSX), which are dependencies external to R. Though many people have them installed for other reasons.

    0 讨论(0)
提交回复
热议问题