问题
I almost finish a messy code to apply several statistical methods/test to 11 data frames from different watersheds with physico-chemical parameters as variables. I reach the goal, but I need to do this functional.
So to start i made a function to compute correlation, and save the results as .txt tables and .pdf images.
It works great when run the function to one dataframe at the time (for that you should import each dataframe separately using read.table
, which is not written in the code below).
As i want it functional, made a list of the 11 dataframes and use lapply
to run the function to each one. It works in the sense that gives me one list (corr
) containing the correlation results of each dataframe.
Here comes the issues:
- The list
cor
with correlation results for each dataframe looks like has values instead of data frames, so i dont know how to access or save them (see thecorr
list in the Environment/Data window). Well, until here, at least looks like correlation results exists somewhere. - The second problem is that when i run
corr<-lapply(PQ_data, cor_PQ)
, which has a line to save the outputs as tables (.txt) and images (.pdf) using part of the name of the original dataframe computed (e.g first element ofPQ_data
is"AgIX_E_PQ"
so table and plot ofcor_PQ(PQ_data[["AgIX_E_PQ"]]
should get the names "mCorAgIX_E_PQ.txt" and "CorAgIX_E_PQ.pdf" respectively), im getting just one output (mCorX[[I]].txt and CorX[[i]].pdf) with the last dataframe correlation result. That is, tables and images for each dataframe correlation result are overwritten into this generics mCorX[[I]].txt, CorX[[i]].pdf files.
Now i guess have to define 'i' or something to avoid this. Should i define cor_PQ
function for PQ_data
instead X
?
If anyone can see where im failing, i will appreciate any help to solve this, please.
My data: PQ_data /save it in your workspace and fix setwd
with it.
My code:
rm(list=ls(all=TRUE))
cat("\014")
setwd("C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs") #my workspace
PQ_files<-list.files(path="C:/Users/Sol/Documents/ProyectoTítulo/CalidadAgua/Matrices/Regs",
pattern="\\_PQ.txt") #my list of 14 dataframes in my workspace.
PQ_data<-lapply(PQ_files, read.table) #read tables of the 14 dataframes in the list.
names(PQ_data)<-gsub("\\_PQ.txt","", PQ_files) #name the 14 dataframes with their original names.
#FUNCTION TO COMPUTE CORRELATIONS, SAVE TABLES AND PLOTS.
cor_PQ<-function(X) {
corPQ<-cor(X, use="pairwise.complete.obs")
outputname.txt<-paste0("mCor",deparse(substitute(X)),".txt")
write.table(corPQ, file=outputname.txt)
outputname.pdf<-paste0("Cor",deparse(substitute(X)),".pdf")
pdf(outputname.pdf)
plot(X)
dev.off()
return(corPQ)
}
corr<-lapply(PQ_data, cor_PQ)
After this, as i said, a get a list called "corr" with 11 elements containing correlation results from each dataframe in my list (PQ_data)
, but i cant access them as tables when i pin the "corr" list in my environment/data window (they dont show the blue R arrow to expand the element).
`
And i get only 2 output files called mCorX[[I]].txt and CorX[[i]].pdf showing only the last dataframe correlation result because the write.table and .pdf functions overwrite the results of the 10 previous calculations.
Again, i will appreciate any help. I really need a push to catch the idea.
Thanks!!!
回答1:
lapply
doesn't send names of the list to the function. So although the function works for individual files it doesn't work with list of files. Also since there are no names to the files all the files generated are given the same name, hence all the new files overwrite the previously existing files and in the end you get output with only 1 file which is the last element in your list. You can use the below function where we send the names as different parameter to assign the name to the files.
cor_PQ<-function(X, Y) {
corPQ<-cor(X, use="pairwise.complete.obs")
outputname.txt<-paste0("mCor",Y,".txt")
write.table(corPQ, file= outputname.txt)
outputname.pdf<-paste0("Cor",Y,".pdf")
pdf(outputname.pdf)
plot(X)
dev.off()
return(corPQ)
}
Now use Map
to apply the same function.
Map(cor_PQ, PQ_data, names(PQ_data))
We can also use imap
from purrr
to apply this function.
purrr::imap(PQ_data, cor_PQ)
来源:https://stackoverflow.com/questions/59376151/write-table-inside-a-function-applied-to-a-list-of-data-frames-overwrite-outputs