All possible Regression in R: Saving coefficients in a matrix

前端 未结 2 1546
天涯浪人
天涯浪人 2020-12-20 08:32

I am running code for all possible models of a phylogenetic generalised linear model. The issue I am having is extracting and saving the beta coefficients for each model. <

相关标签:
2条回答
  • 2020-12-20 09:04
    for(i in 1:length(formula)){
        fit = lm(formula(formula), data)
         beta[i, 1:length(fit$coefficients)] <- fit$coefficients
    }
    

    Update

    Idea: name your columns after coefficients, and assign values to columns by name.

    It is just a dummy example but should help you: Create your output matrix:

    beta <- matrix(NA,  nrow=7, ncol=4)
    colnames(beta) <- c("(Intercept)", 'A', 'B', 'C')
    

    Create some dummy data:

     A <- rnorm(10)
     B <- rpois(10, 1)
     C <- rnorm(10, 2)
     Y <- rnorm(10, -1)
    

    Now you can do something like that:

    fit <- lm(Y ~ A + B + C)
    beta[1, names(fit$coefficients)] <- fit$coefficients
    
    fit <- lm(Y ~ A + B)
    beta[2, names(fit$coefficients)] <- fit$coefficients
    
    fit <- lm(Y ~ A + C)
    beta[3, names(fit$coefficients)] <- fit$coefficients
    
    fit <- lm(Y ~ B + C)
    beta[4, names(fit$coefficients)] <- fit$coefficients
    
    fit <- lm(Y ~ A)
    beta[5, names(fit$coefficients)] <- fit$coefficients
    
    fit <- lm(Y ~ B)
    beta[6, names(fit$coefficients)] <- fit$coefficients
    
    fit <- lm(Y ~ C)
    beta[7, names(fit$coefficients)] <- fit$coefficients
    
    0 讨论(0)
  • 2020-12-20 09:15

    How about using names and %in% to subset the right columns. Extract the coefficient values using coef. Like this:

    beta = matrix(NA, nrow = length(formula), ncol = 3)
    colnames(beta) <- colnames(inpdv)
    
    for(i in 1:length(formula)){
       fit = lm(formula(formula[i]), data)
        coefs <- coef(fit)
        beta[ i , colnames(beta) %in% names( coefs ) ] <- coefs[ names( coefs ) %in% colnames( beta ) ]
    }
    #              A          B         C
    #[1,] -0.4229837 -0.0519900 0.3787666
    #[2,]         NA  0.7015679 0.0555350
    #[3,] -0.4165834         NA 0.3692974
    #[4,]         NA         NA 0.1346726
    #[5,] -0.2035173  0.7049951        NA
    #[6,]         NA  0.7978726        NA
    #[7,] -0.2229959         NA        NA
    #[8,]         NA         NA        NA
    

    I think it's perfectly acceptable to use a for loop here. For starters using something like lapply sometimes keep increasing memory usage as you run more and more of the simulations. R will sometimes not mark objects from earlier models as trash until the lapply loop finishes so so can sometimes get a memory allocation error. Using the for loop I find that R will reuse memory allocated to the previous iteration of the loop if necessary so if you can run the loop once, you can run it lots of times.

    The other reason not to use a for loop is speed, but I would assume that the time to iterate is negligible compared to the time to fit the model so I would use it.

    0 讨论(0)
提交回复
热议问题