问题
I have a set of CSV files. I want to package them up and export the data to a single Excel file that contains multiple worksheets. I read in the CSV files as a set of data frames.
My problem is how to construct the command in openxlsx
, I can do it manually, but I am having a list construction issue. Specifically how to add a data frame as a subcomponent of a named list and then pass as a parameter to write.xlsx()
Example
Ok, so I first list the CSV files on disk and generate a set of data frames in memory...
# Generate a list of csv files on disk and shorten names...
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath, pattern = filePattern, full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
datasets <- gsub("*.csv$", "", fileList)
datasets <- gsub("sample_", "S", datasets)
datasets
# Now generate the dataframes for each csv file...
list2env(
lapply(setNames(fileListwithPath, make.names(datasets)),
read.csv), envir = .GlobalEnv)
Example Output:
dput(datasets)
c("S10000_R3.3.2_201703301839", "S10000_T4.3.0_201703301843",
"S20000_R3.3.2_201703301826", "S20000_T4.3.0_201703301832", "S280000_R3.3.2_201704020847",
"S280000_T4.3.0_201704021100", "S290000_R3.3.2_201704020447",
"S290000_T4.3.0_201704020702", "S30000_R3.3.2_201703301803",
"S30000_T4.3.0_201703301817", "S310000_R3.3.2_201704012331",
"S310000_T4.3.0_201704020242", "S320000_R3.3.2_201704011827",
"S320000_T4.3.0_201704012128", "S330000_R3.3.2_201704011304",
"S330000_T4.3.0_201704011546", "S340000_R3.3.2_201704010652",
"S340000_T4.3.0_201704011010", "S350000_R3.3.2_201704010020",
"S350000_T4.3.0_201704010404", "S360000_R3.3.2_201703311819",
"S360000_T4.3.0_201703312134", "S370000_R3.3.2_201703310914",
"S370000_T4.3.0_201703311301", "S380000_R3.3.2_201703310134",
"S380000_T4.3.0_201703310509", "S390000_R3.3.2_201703301846",
"S390000_T4.3.0_201703302252", "S40000_R3.3.2_201703301738",
"S40000_T4.3.0_201703301752", "S50000_R3.3.2_201703301707", "S50000_T4.3.0_201703301724",
"S60000_R3.3.2_201703301624", "S60000_T4.3.0_201703301647", "S70000_R3.3.2_201703301535",
"S70000_T4.3.0_201703301602", "S80000_R3.3.2_201703301430", "S80000_T4.3.0_201703301508",
"S90000_R3.3.2_201703301324", "S90000_T4.3.0_201703301400")
Now we have a set of data frames and we wish to create a single excel file with multiple worksheets...
wb <- createWorkbook()
saveWorkbook(wb, 'output.xlsx')
lapply(names(myList), function(x) write.xlsx(myList[[x]], 'output.xlsx', sheetName=x, append=TRUE))
Problem:
The problem is I can create the list structure manually and can confirm it works BUT I cannot seem to construct the list automatically.
myList <- sapply(datasets,function(x) NULL)
names(myList)
str(myList)
myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
thus:
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
Specific Problem: How to append each data frame to the list...
myList <- lapply( myList, function(x) eval(x) )
what am I doing wrong with lapply here? The above lapply() does not iterate through the list and append the data frame to the name list entry.
i.e. myList$S10000_R3.3.2_201703301839 <- eval(S10000_R3.3.2_201703301839)
> str(myList)
List of 40
$ S10000_R3.3.2_201703301839 :'data.frame': 43 obs. of 4 variables:
..$ function.: Factor w/ 42 levels "DF add random number vector",..: 30 25 38 42 36 39 40 29 26 22 ...
..$ user : num [1:43] 2.144 0.263 0.024 0.068 0.008 ...
..$ system : num [1:43] 0.63 0.065 0.001 0.004 0 ...
..$ elapsed : num [1:43] 12.274 1.104 0.047 0.115 0.009 ...
$ S10000_T4.3.0_201703301843 : NULL
$ S20000_R3.3.2_201703301826 : NULL
...
What am I missing? All help gratefully appreciated. Yes, I am pretty certain I am missing something obvious... but... I am stumped.
回答1:
I don't have your data frames, so I can't test this, but the code below is similar to the approach I use when I need to read and write Excel files. The code below uses the xlsx
package, as that's what I'm familiar with, but hopefully you can adapt it if you need to use openxlsx
.
library(xlsx)
First, read the files into a list. Something like this:
filePath <- "../02benchmark/results/results_20170330/"
filePattern <- "*.csv"
fileListwithPath = list.files(path = filePath,
pattern = filePattern,
full.names = TRUE)
fileList = list.files(path = filePath, pattern = filePattern, full.names = FALSE)
fileListwithPath = setNames( fileListwithPath,
list.files(path = filePath, pattern = filePattern))
df.list = lapply(fileListwithPath, read.csv)
# Now we rename the List Names for use in worksheets...
# Remove .csv and sample_ prefix used in filenames...
# Reult in workbook S<size>_<R version>_<date>
names(df.list) <- gsub("\\.csv$","", names(df.list))
names(df.list) <- gsub("sample_","S", names(df.list))
You now have a list in which each element is a data frame and each element's name is the name of the file. Now, let's write each data frame to a different worksheet in the same Excel workbook and then save the file as an xlsx file:
wb = createWorkbook()
lapply( names(df.list),
function(df) {
sheet = createSheet(wb, df)
addDataFrame(df.list[[df]], sheet = sheet, row.names = FALSE)
} )
saveWorkbook(wb, "My_workbook.xlsx")
I've separated reading and writing the csv files for illustration, but you can combine them into a single function that reads each individual csv file and writes it to a new sheet in a single Excel workbook.
回答2:
Here's the solution with openxlsx:
## create data;
dataframes <- split(iris, iris$Species)
# create workbook
wb <- createWorkbook()
#Iterate the same way as PavoDive, slightly different (creating an anonymous function inside Map())
Map(function(data, nameofsheet){
addWorksheet(wb, nameofsheet)
writeData(wb, nameofsheet, data)
}, dataframes, names(dataframes))
## Save workbook to excel file
saveWorkbook(wb, file = "file.xlsx", overwrite = TRUE)
.. however, openxlsx is also able to use it's function openxlsx::write.xlsx for this, so you can just give the object with your list of dataframes and the filepath, and openxlsx is smart enough to create the list as sheets within the xlsx-file. The code I post here with Map() is if you want to format the sheets in a specific way.
来源:https://stackoverflow.com/questions/43176266/use-r-and-openxlsx-to-output-a-list-of-dataframes-as-worksheets-in-a-single-exce