R, rbind with multiple files defined by a variable

前端 未结 2 2048
滥情空心
滥情空心 2021-01-27 09:21

First off, this is related to a homework question for the Coursera R programming course. I have found other ways to do what I want to do but my research has led me to a question

相关标签:
2条回答
  • 2021-01-27 09:51

    Well, this uses an lapply, but it might be what you want.

    file_list <- list.files("*your directory*", full.names = T)
    
    combined_data <- do.call(rbind, lapply(file_list, read.csv, header = TRUE))
    

    This will turn all of your files into one large dataset, and from there it's easy to take the mean. Is that what you wanted?

    An alternative way of doing this would be to step through file by file, taking sums and number of observations and then taking the mean afterwards, like so:

    sums <- numeric()
    n <- numeric()
    i <- 1
    for(file in file_list){
      temp_df <- read.csv(file, header = T)
      temp_mean <- mean(temp_df$pollutant)
      sums[i] <- sum(temp_df$pollutant)
      n[i] <- nrow(temp_df)
      i <- i + 1
    }
    new_mean <- sum(sums)/sum(n)
    

    Note that both of these methods require that only your desired csvs are in that folder. You can use a pattern argument in the list.files call if you have other files in there that you're not interested in.

    0 讨论(0)
  • 2021-01-27 09:51

    A vector is not accepted for 'file' in read.csv(file, ...)

    Below is a slight modification of yours. A vector of file paths are created and they are looped by sapply.

    files <- paste("directory-name/",formatC(1:332, width=3, flag="0"),
                   ".csv",sep="")
    pollutantmean <- function(file, pollutant) {
        dataset <- read.csv(file, header = TRUE)
        mean(dataset[, pollutant], na.rm = TRUE)
    }
    sapply(files, pollutantmean)
    
    0 讨论(0)
提交回复
热议问题