Using R's lm on a dataframe with a list of predictors

前端 未结 3 643
臣服心动
臣服心动 2020-12-15 21:26

I have a dataframe with let\'s say N+2 columns. The first is just dates (mainly used for plotting later on), the second is a variable whose response to the remaining N colu

相关标签:
3条回答
  • 2020-12-15 22:05

    Using the formula notation y ~ . specifies that you want to regress y on all of the other variables in the dataset.

    df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
    # fits a model using x1 and x2
    fit <- lm(y ~ ., data = df) 
    # Removes the column containing x1 so regression on x2 only
    fit <- lm(y ~ ., data = df[, -2]) 
    
    0 讨论(0)
  • 2020-12-15 22:11

    There is an alternative to Dason's answer, for when you want to specify the columns, to exclude, by name. It is to use subset(), and specify the select argument:

    df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
    fit = lm(y ~ ., data = subset(df, select=-x1))
    

    Trying to use data[,-c("x1")] fails with "invalid argument to unary operator".

    It can extend to excluding multiple columns: subset(df, select = -c(x1,x2))

    And you can still use numeric columns:

    df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
    fit = lm(y ~ ., data = subset(df, select = -2))
    

    (That is equivalent to subset(df, select=-x1) because x1 is the 2nd column.)

    Naturally you can also use this to specify the columns to include.

    df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
    fit = lm(y ~ ., data = subset(df, select=c(y,x2)) )
    

    (Yes, that is equivalent to lm(y ~ x2, df) but is distinct if you were then going to be using step(), for instance.)

    0 讨论(0)
  • 2020-12-15 22:12

    I am fairly new to R, but I found another way to do this for named columns in a data frame. Say you want to run regression using all columns except for column x2, then you'll write:

    df = data.frame(y = 1:10, x1 = runif(10), x2 = rnorm(10))
    # Removes the column containing x2 so regression on x1 only
    model <- lm(Y ~ . - x2, data = df)
    # to remove more columns (assuming there were more columns in the data frame)
    model <- lm(Y ~ . - x2 - x3 - x4, data = df)
    

    The rest of the answers are pretty old, so maybe it's a new feature, but it's pretty neat!

    0 讨论(0)
提交回复
热议问题