lm function in R does not give coefficients for all factor levels in categorical data

前端 未结 1 1646
小蘑菇
小蘑菇 2020-11-30 11:47

I was trying out linear regression with R using categorical attributes and observe that I don\'t get a coefficient value for each of the different factor levels I have.

相关标签:
1条回答
  • 2020-11-30 12:11

    GE is dropped, alphabetically, as the intercept term. As eipi10 stated, you can interpret the coefficients for the other levels in states with GE as the baseline (statesLA = 0.1 meaning LA is, on average, 0.1x more than GE).

    EDIT:

    To respond to your updated question:

    If you include all of the levels in a linear regression, you're going to have a situation called perfect collinearity, which is responsible for the strange results you're seeing when you force each category into its own variable. I won't get into the explanation of that, just find a wiki, and know that linear regression doesn't work if the variable coefficients are completely represented (and you're also expecting an intercept term). If you want to see all of the levels in a regression, you can perform a regression without an intercept term, as suggested in the comments, but again, this is ill-advised unless you have a specific reason to.

    As for the interpretation of GE in your y=mx+c equation, you can calculate the expected y by knowing that the levels of the other states are binary (zero or one), and if the state is GE, they will all be zero.

    e.g.

    y = x1b1 + x2b2 + x3b3 + c
    y = b1(0) + b2(0) + b3(0) + c
    y = c
    

    If you don't have any other variables, like in your first example, the effect of GE will be equal to the intercept term (0.6).

    0 讨论(0)
提交回复
热议问题