Error when building regression model using lm ( Error in `contrasts<-`(`*tmp*`… contrasts can be applied only to factors with 2 or more levels) [duplicate]

走远了吗. 提交于 2019-12-12 05:44:58

问题


I get this error depending on which variables I include and the sequence in which I specify them in the formula:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

I've done a little research on this and it looks like it would be caused by the variable in question not being a factor variable. In this case (is_women_owned), it is a factor variable with 2 levels ("Yes", "No").

> levels(customer_accounts$is_women_owned)
[1] "No"  "Yes"

No error:

f1 <- lm(combined_sales ~ is_women_owned, data=customer_accounts)

No error:

f2 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth, data=customer_accounts)

Regressing on the above formula plus the factor variable "is_women_owned":

f3 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth + is_women_owned, data=customer_accounts)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

I get the same error when applying stepwise linear regression, as you would expect.

This seems like a bug, it should give us a model where "is_women_owned" perhaps offers no additional explanatory value because it is highly correlated to the other variables, not error out like this.

I verified that there is no missing data for this variable, too:

> which(is.na(customer_accounts$is_women_owned))
integer(0)

Also, there are two values present in the factor variable:

customer_accounts$is_women_owned[1:20]
 [1] No  No  No  No  No  No  No  No  No  No  No  No  No  No  Yes No 
[17] No  No  No  No 
Levels: No Yes

回答1:


twofac = data.frame("y" = c(1,2,3,4,5,1), "x" = c(2,56,3,5,2,1), "f" = c("apple","apple","apple","apple","apple","banana"))
onefac = twofac[1:5,]

lm(y~x+f,data=twofac)
lm(y~x+f,data=onefac)

> str(onefac)
'data.frame':   5 obs. of  3 variables:
 $ y: num  1 2 3 4 5
 $ x: num  2 56 3 5 2
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1
> str(twofac)
'data.frame':   6 obs. of  3 variables:
 $ y: num  1 2 3 4 5 1
 $ x: num  2 56 3 5 2 1
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1 2
> lm(y~x+f,data=twofac)

Call:
lm(formula = y ~ x + f, data = twofac)

Coefficients:
(Intercept)            x      fbanana  
    3.30783     -0.02263     -2.28519  

> lm(y~x+f,data=onefac)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

If you run the above you will notice twofac, a model with a 2-level factor where both factors are present, will run with no problem. onefac, a model with the same 2-level factor but only one level is present, gives the same error you got.

If your factor only has one of the levels then regressing against that factor gives no additional information as it is constant across all responsevariables



来源:https://stackoverflow.com/questions/34819810/error-when-building-regression-model-using-lm-error-in-contrasts-tmp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!