How to match a data frame of variable names and another with data for a regression?

与世无争的帅哥 提交于 2019-12-11 07:05:10

问题


I have two data frames:

x = data.frame(Var1= c("A", "B", "C", "D","E"),Var2=c("F","G","H","I","J"),
    Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18), B= c(15, 16, 17, 14,18),C= c(17, 22, 23, 24,18), D= c(11, 12, 13, 34,18),E= c(11, 5, 13, 55,18),  F= c(8, 12, 13, 14,18),G= c(7, 5, 13, 14,18),
    H= c(8, 12, 13, 14,18), I= c(9, 5, 13, 14,18), J= c(11, 12, 13, 14,18))

Var3 <- rep("time", each=length(x$Var1))

x=cbind(x,Var3)

time=seq(1:length(y[,1]))
y=cbind(y,time)

> x
  Var1 Var2 Value Var3
1    A    F    11 time
2    B    G    12 time
3    C    H    13 time
4    D    I    14 time
5    E    J    18 time
> y
   A  B  C  D  E  F  G  H  I  J time
1 11 15 17 11 11  8  7  8  9 11    1
2 12 16 22 12  5 12  5 12  5 12    2
3 13 17 23 13 13 13 13 13 13 13    3
4 14 14 24 34 55 14 14 14 14 14    4
5 18 18 18 18 18 18 18 18 18 18    5

Looking at x DF, I have variable A and F as the first row. I want to select these two variables in y DF and implement a simple regression: lm(A ~ F, data = y), and save the result in the first position of a list. I will do the same with the second row of x DF implementing a regression lm(B ~ G, data = y).

How could I match variables names in x to data in y for a regression?


Revised question: how about a more complicated regression Var1 ~ Var2 + Var3?


回答1:


x = data.frame(Var1= c("A", "B", "C", "D","E"),
               Var2=c("F","G","H","I","J"),
               Value= c(11, 12, 13, 14,18))

y = data.frame(A= c(11, 12, 13, 14,18),
               B= c(15, 16, 17, 14,18),
               C= c(17, 22, 23, 24,18),
               D= c(11, 12, 13, 34,18),
               E= c(11, 5, 13, 55,18),
               F= c(8, 12, 13, 14,18),
               G= c(7, 5, 13, 14,18),
               H= c(8, 12, 13, 14,18), 
               I= c(9, 5, 13, 14,18),
               J= c(11, 12, 13, 14,18))

We can use

fitmodel <- function (RHS, LHS) do.call("lm", list(formula = reformulate(RHS, LHS),
                                              data = quote(y)))

modList <- Map(fitmodel, as.character(x$Var2), as.character(x$Var1))

modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F, data = y)
#
#Coefficients:
#(Intercept)            F  
#     4.3500       0.7115  

Remarks:

  1. The use of do.call is to ensure that reformulate is evaluated when passed to lm. This is desired as it allows functions like update to work correctly on the model object. See Showing string in formula and not as variable in lm fit. For a comparison:

    oo <- Map(function (RHS, LHS) lm(reformulate(RHS, LHS), data = y),
              as.character(x$Var2), as.character(x$Var1))
    oo[[1]]
    #Call:
    #lm(formula = reformulate(RHS, LHS), data = y)
    #
    #Coefficients:
    #(Intercept)            F  
    #     4.3500       0.7115  
    
  2. The as.character on x$Var1 and x$Var2 is necessary, as these two variables are currently "factor" variables not strings and reformulate can't use them. If you put stringsAsFactors = FALSE in data.frame when you build your x, there is no such issue.

It works for you? It's not suppose to have a "for" loop?

The Map function hides that "for" loop. It is a wrapper of the mapply function. The *apply family functions in R are a syntactic sugar.


Update on your revised question

Your original question is constructs a model formula as Var1 ~ Var2.

Your new question wants Var1 ~ Var2 + Var3.

x$Var3 <- rep("time", each=length(x$Var1))
y$time <- seq(1:length(y[,1]))

## collect multiple RHS variables (using concatenation function `c`)
RHS <- Map(base::c, as.character(x$Var2), as.character(x$Var3))
#str(RHS)
#List of 5  ## oh this list has names! annoying!!
# $ F: chr [1:2] "F" "time"
# $ G: chr [1:2] "G" "time"
# $ H: chr [1:2] "H" "time"
# $ I: chr [1:2] "I" "time"
# $ J: chr [1:2] "J" "time"
LHS <- as.character(x$Var1)
modList <- Map(fitmodel, RHS, LHS)  ## `fitmodel` function unchanged
modList[[1]]  ## for example
#Call:
#lm(formula = A ~ F + time, data = y)
#
#Coefficients:
#(Intercept)            F         time  
#        5.6          0.5          0.5  


来源:https://stackoverflow.com/questions/51914163/how-to-match-a-data-frame-of-variable-names-and-another-with-data-for-a-regressi

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!