Output Regression statistics for each variable one at a time in R

后端 未结 3 1206
说谎
说谎 2021-01-15 11:35

I have a data frame that looks like this. names and number of columns will NOT be consistent (sometimes \'C\' will not be present, other times \"D\', \'E\', \'F\' may be pre

相关标签:
3条回答
  • 2021-01-15 11:57

    This how I do this kind of modeling. Following example assumes I am varying different outcomes, and different exposures for a given set of covariates.

    I first define my outcomes and exposures I want to test (I think in terms of epidemiology but you can extend).

    outcomes <- c("a","b","c","d")

    exposures <- c("exp1","exp2","exp3")

    The assumption is that each element specified in those vectors exist as column names in your dataset (as well as the covariates listed below after the "~").

    final_lm_data <- data.frame() #initialize empty dataframe to hold results
    for (j in 1:length(exposures){
      for (i in 1:length(outcomes){
        mylm <- lm(formula(paste(outcomes[i], "~", "continuous.cov.1 + 
            continuous.cov.2 + factor(categorical.variable.1)", "+",
                                 exposure[j])), data=mydata)
    
        coefficent.table <- as.data.frame(coef(summary(mylm)))
    
        mylm_data <- as.data.frame(cbind(ctable,Variable = rownames(ctable),
                                         Outcome = outcomes[i],
                                         Exposure = exposures[j],
                                         Model_N = paste(length(mylm$residuals))))
        names(mylm_data)[4] <- "Pvalue"  # renaming the "Pr(>|t|)"
        rownames(mylm_data) <- NULL # important because we are creating stacked output dataset
        final_lm_data <- rbind(final_lm_data,mylm_data)
      }
    }
    

    This will give you a final_lm_data that contains your estimates, std.errors, tstatistics, pvalues for each variable in your model, and also keep track of the iteration of Outcome and Exposure (first and last elements of your model). Lastly, it has the N used after dropping data records for missing values. You can modify the mylm_data creation to capture more information from the model (such as rsq etc..).

    Finally, if covariates also vary from run to run, I am not sure how to automate that part.

    0 讨论(0)
  • 2021-01-15 12:03

    Here is a solution using *apply:

    Y <- c(4, 4, 3, 4, 3, 2, 3, 2, 2, 3, 4, 4, 3, 4, 8, 6, 5, 4, 3, 6)
    A <- c(1, 2, 1, 2, 3, 2, 1, 1, 1, 2, 1, 4, 3, 1, 2, 2, 1, 2, 4, 8)
    B <- c(5, 6, 6, 5, 3, 7, 2, 1, 1, 2, 7, 4, 7, 8, 5, 7, 6, 6, 4, 7)
    C <- c(9, 1, 2, 2, 1, 4, 5, 6, 7, 8, 89, 9, 7, 6, 5, 6, 8, 9 , 67, 6)
    YABC <- data.frame(Y, A, B, C)
    
    names <- colnames(YABC[-1])
    
    formulae <- sapply(names,function(x)as.formula(paste('Y~',x)))
    
    lapply(formulae, function(x) lm(x, data = YABC))
    

    Of course you can also call summary:

    lapply(formulae, function(x) summary(lm(x, data = YABC)))
    

    If you want to extract variables from a specific model do as follows:

    results <- lapply(formulae, function(x) lm(x, data = YABC))
    results$A$coefficients
    

    gives the coefficients from the model using A as explanatory var

    0 讨论(0)
  • 2021-01-15 12:08

    As written in the comment: ?as.formula() is one solution. You could do sthg like:

    model = list()
    for(char in names(YABC)[-1]) {
      model[[char]] <- lm(as.formula(paste("Y ~ ", char)), YABC)
    }
    model
    
    0 讨论(0)
提交回复
热议问题