How to set up balanced one-way ANOVA for lm()

会有一股神秘感。 提交于 2019-12-22 12:38:53

问题


I have data:

dat <- data.frame(NS = c(8.56, 8.47, 6.39, 9.26, 7.98, 6.84, 9.2, 7.5),
                  EXSM = c(7.39, 8.64, 8.54, 5.37, 9.21, 7.8, 8.2, 8),
                  Less.5 = c(5.97, 6.77, 7.26, 5.74, 8.74, 6.3, 6.8, 7.1),
                  More.5 = c(7.03, 5.24, 6.14, 6.74, 6.62, 7.37, 4.94, 6.34))

#     NS EXSM Less.5 More.5
# 1 8.56 7.39   5.97   7.03
# 2 8.47 8.64   6.77   5.24
# 3 6.39 8.54   7.26   6.14
# 4 9.26 5.37   5.74   6.74
# 5 7.98 9.21   8.74   6.62
# 6 6.84 7.80   6.30   7.37
# 7 9.20 8.20   6.80   4.94
# 8 7.50 8.00   7.10   6.34

Each column gives data from a group. I use group index variable:

group <- c(rep("NS",8), rep("EXSM",8), rep("More.5",8), rep("Less.5",8))

My error occurs when I try the command

fit <- lm(NS ~ group, data = dat)
Error in model.frame.default(formula = NS ~ group, data = dat, drop.unused.levels = TRUE) : 
  variable lengths differ (found for 'group')

I am new to lm() function and where am I doing wrong? I know that after this I just have to call

anova(fit)
plot(fit)

Any help is appreciated!


回答1:


We first use stack() to reshape your data:

DAT <- setNames(stack(dat), c("y", "group"))
#       y  group
# 1  8.56     NS
# 2  8.47     NS
# 3  6.39     NS
# 4  9.26     NS
# 5  7.98     NS
# 6  6.84     NS
# 7  9.20     NS
# 8  7.50     NS
# 9  7.39   EXSM
# 10 8.64   EXSM
# 11 8.54   EXSM
# 12 5.37   EXSM
# 13 9.21   EXSM
# 14 7.80   EXSM
# 15 8.20   EXSM
# 16 8.00   EXSM
# 17 5.97 Less.5
# 18 6.77 Less.5
# 19 7.26 Less.5
# 20 5.74 Less.5
# 21 8.74 Less.5
# 22 6.30 Less.5
# 23 6.80 Less.5
# 24 7.10 Less.5
# 25 7.03 More.5
# 26 5.24 More.5
# 27 6.14 More.5
# 28 6.74 More.5
# 29 6.62 More.5
# 30 7.37 More.5
# 31 4.94 More.5
# 32 6.34 More.5

Categorical variable should be coded as factor. We use factor for coding. Use the levels argument to specify factor levels.

DAT$group <- factor(DAT$group, levels = c("NS", "EXSM", "Less.5", "More.5"))

Now, column y is the independent variable (response), while column group is the dependent variable (covariate)

Before statistical modelling, we can use boxplot to visualize your group data:

boxplot(y ~ group, DAT)  ## formula method for boxplot

We see that group "NS" and "EXSM" do not appear to have noticeable difference in mean, but other two levels are quite different in mean. Let's call lm():

fit <- lm(y ~ group, data = DAT)

For analysis of your model, use summary() and anova():

summary(fit)

# Call:
# lm(formula = y ~ group)

# Residuals:
#      Min       1Q   Median       3Q      Max 
# -2.52375 -0.52750  0.07187  0.56281  1.90500 

# Coefficients:
#             Estimate Std. Error t value Pr(>|t|)    
# (Intercept)   8.0250     0.3553  22.585   <2e-16 ***
# groupEXSM    -0.1312     0.5025  -0.261   0.7959    
# groupLess.5  -1.7225     0.5025  -3.428   0.0019 ** 
# groupMore.5  -1.1900     0.5025  -2.368   0.0250 *  
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# Residual standard error: 1.005 on 28 degrees of freedom
# Multiple R-squared:  0.3709,  Adjusted R-squared:  0.3035 
# F-statistic: 5.502 on 3 and 28 DF,  p-value: 0.004231

anova(fit)
# Analysis of Variance Table

# Response: y
#           Df Sum Sq Mean Sq F value   Pr(>F)   
# group      3 16.674  5.5579  5.5025 0.004231 **
# Residuals 28 28.282  1.0101                    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


来源:https://stackoverflow.com/questions/38192276/how-to-set-up-balanced-one-way-anova-for-lm

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!