问题
I have data:
dat <- data.frame(NS = c(8.56, 8.47, 6.39, 9.26, 7.98, 6.84, 9.2, 7.5),
EXSM = c(7.39, 8.64, 8.54, 5.37, 9.21, 7.8, 8.2, 8),
Less.5 = c(5.97, 6.77, 7.26, 5.74, 8.74, 6.3, 6.8, 7.1),
More.5 = c(7.03, 5.24, 6.14, 6.74, 6.62, 7.37, 4.94, 6.34))
# NS EXSM Less.5 More.5
# 1 8.56 7.39 5.97 7.03
# 2 8.47 8.64 6.77 5.24
# 3 6.39 8.54 7.26 6.14
# 4 9.26 5.37 5.74 6.74
# 5 7.98 9.21 8.74 6.62
# 6 6.84 7.80 6.30 7.37
# 7 9.20 8.20 6.80 4.94
# 8 7.50 8.00 7.10 6.34
Each column gives data from a group. I use group index variable:
group <- c(rep("NS",8), rep("EXSM",8), rep("More.5",8), rep("Less.5",8))
My error occurs when I try the command
fit <- lm(NS ~ group, data = dat)
Error in model.frame.default(formula = NS ~ group, data = dat, drop.unused.levels = TRUE) :
variable lengths differ (found for 'group')
I am new to lm()
function and where am I doing wrong? I know that after this I just have to call
anova(fit)
plot(fit)
Any help is appreciated!
回答1:
We first use stack()
to reshape your data:
DAT <- setNames(stack(dat), c("y", "group"))
# y group
# 1 8.56 NS
# 2 8.47 NS
# 3 6.39 NS
# 4 9.26 NS
# 5 7.98 NS
# 6 6.84 NS
# 7 9.20 NS
# 8 7.50 NS
# 9 7.39 EXSM
# 10 8.64 EXSM
# 11 8.54 EXSM
# 12 5.37 EXSM
# 13 9.21 EXSM
# 14 7.80 EXSM
# 15 8.20 EXSM
# 16 8.00 EXSM
# 17 5.97 Less.5
# 18 6.77 Less.5
# 19 7.26 Less.5
# 20 5.74 Less.5
# 21 8.74 Less.5
# 22 6.30 Less.5
# 23 6.80 Less.5
# 24 7.10 Less.5
# 25 7.03 More.5
# 26 5.24 More.5
# 27 6.14 More.5
# 28 6.74 More.5
# 29 6.62 More.5
# 30 7.37 More.5
# 31 4.94 More.5
# 32 6.34 More.5
Categorical variable should be coded as factor. We use factor
for coding. Use the levels
argument to specify factor levels.
DAT$group <- factor(DAT$group, levels = c("NS", "EXSM", "Less.5", "More.5"))
Now, column y
is the independent variable (response), while column group
is the dependent variable (covariate)
Before statistical modelling, we can use boxplot
to visualize your group data:
boxplot(y ~ group, DAT) ## formula method for boxplot
We see that group "NS" and "EXSM" do not appear to have noticeable difference in mean, but other two levels are quite different in mean. Let's call lm()
:
fit <- lm(y ~ group, data = DAT)
For analysis of your model, use summary()
and anova()
:
summary(fit)
# Call:
# lm(formula = y ~ group)
# Residuals:
# Min 1Q Median 3Q Max
# -2.52375 -0.52750 0.07187 0.56281 1.90500
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 8.0250 0.3553 22.585 <2e-16 ***
# groupEXSM -0.1312 0.5025 -0.261 0.7959
# groupLess.5 -1.7225 0.5025 -3.428 0.0019 **
# groupMore.5 -1.1900 0.5025 -2.368 0.0250 *
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Residual standard error: 1.005 on 28 degrees of freedom
# Multiple R-squared: 0.3709, Adjusted R-squared: 0.3035
# F-statistic: 5.502 on 3 and 28 DF, p-value: 0.004231
anova(fit)
# Analysis of Variance Table
# Response: y
# Df Sum Sq Mean Sq F value Pr(>F)
# group 3 16.674 5.5579 5.5025 0.004231 **
# Residuals 28 28.282 1.0101
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
来源:https://stackoverflow.com/questions/38192276/how-to-set-up-balanced-one-way-anova-for-lm