With R, loop over data frames, and assign appropriate names to objects created in the loop

后端 未结 3 1667
一生所求
一生所求 2020-12-28 22:48

This is something which data analysts do all the time (especially when working with survey data which features missing responses.) It\'s common to first multiply impute a se

相关标签:
3条回答
  • 2020-12-28 23:10

    Use a list to store the results of your regression models as well, e.g.

    foo <- function(n) return(transform(X <- as.data.frame(replicate(2, rnorm(n))), 
                                                           y = V1+V2+rnorm(n)))
    write.csv(foo(10), file="dat1.csv")
    write.csv(foo(10), file="dat2.csv")
    csvdat <- list.files(pattern="dat.*csv")
    lm.res <- list()
    for (i in seq(along=csvdat))
      lm.res[[i]] <- lm(y ~ ., data=read.csv(csvdat[i]))
    names(lm.res) <- csvdat
    
    0 讨论(0)
  • 2020-12-28 23:12

    what you want is a combination of the functions seq_along() and assign()

    seq_along helps creates a vector from 1 to 5 if there are five objects in csvdat (to get the appropriate numbers and not only the variable names). Then assign (using paste to create the appropriate astrings from the numbers) lets you create the variable.

    Note that you will also need to load the data file first (was missing in your example):

    for (x in seq_along(csvdat)) {
        data.in <- read.csv(csvdat[x])   #be sure to change this to read.table if necessary
        assign(paste("lm.", x, sep = ""), lm(y ~ x1 + x2, data = data.in))
    }
    

    seq_along is not totally necessary, there could be other ways to solve the numeration problem.

    The critical function is assign. With assign you can create variables with a name based on a string. See ?assign for further info.


    Following chl's comments (see his post) everything in one line:

    for (x in seq_along(csvdat)) assign(paste("lm", x, sep = "."), lm(y ~ x1 + x2, data = read.csv(csvdat[x]))
    
    0 讨论(0)
  • 2020-12-28 23:13

    Another approach is to use the plyr package to do the looping. Using the example constructed by @chl, here is how you would do it

    require(plyr)
    
    # read csv files into list of data frames
    data_frames = llply(csvdat, read.csv)
    
    # run regression models on each data frame
    regressions = llply(data_frames, lm, formula = y ~ .)
    names(regressions) = csvdat
    
    0 讨论(0)
提交回复
热议问题