Prediction of 'mlm' linear model object from `lm()`

后端未结

关注

 1  2011

无人及你

I have three datasets:

response - matrix of 5(samples) x 10(dependent variables)

predictors - matrix of 5(samples) x 2(independent variables)

test_se

相关标签:

1条回答

-上瘾入骨i

2020-12-21 10:37
You are stepping into a poorly supported part in R. The model class you have is "mlm", i.e., "multiple linear models", which is not the standard "lm" class. You get it when you have several (independent) response variables for a common set of covariates / predictors. Although lm() function can fit such model, predict method is poor for "mlm" class. If you look at methods(predict), you would see a predict.mlm*. Normally for a linear model with "lm" class, predict.lm is called when you call predict; but for a "mlm" class the predict.mlm* is called.

predict.mlm* is too primitive. It does not allow se.fit, i.e., it can not produce prediction errors, confidence / prediction intervals, etc, although this is possible in theory. It can only compute prediction mean. If so, why do we want to use predict.mlm* at all?! The prediction mean can be obtained by a trivial matrix-matrix multiplication (in standard "lm" class this is a matrix-vector multiplication), so we can do it on our own.

Consider this small, reproduce example.
```
set.seed(0)
## 2 response of 10 observations each
response <- matrix(rnorm(20), 10, 2)
## 3 covariates with 10 observations each
predictors <- matrix(rnorm(30), 10, 3)
fit <- lm(response ~ predictors)

class(fit)
# [1] "mlm" "lm"

beta <- coef(fit)
#                  [,1]       [,2]
#(Intercept)  0.5773235 -0.4752326
#predictors1 -0.9942677  0.6759778
#predictors2 -1.3306272  0.8322564
#predictors3 -0.5533336  0.6218942
```
When you have a prediction data set:
```
# 2 new observations for 3 covariats
test_set <- matrix(rnorm(6), 2, 3)
```
we first need to pad an intercept column
```
Xp <- cbind(1, test_set)
```
Then do this matrix multiplication
```
pred <- Xp %*% beta
#          [,1]      [,2]
#[1,] -2.905469  1.702384
#[2,]  1.871755 -1.236240
```
Perhaps you have noticed that I did not even use a data frame here. Yes it is unnecessary as you have everything in matrix form. For those R wizards, maybe using lm.fit or even qr.solve is more straightforward.

But as a complete answer, it is a must to demonstrate how to use predict.mlm to get our desired result.
```
## still using previous matrices
training_dataframe <- data.frame(response = I(response), predictors = I(predictors))
fit <- lm(response ~ predictors, data = training_dataframe)
newdat <- data.frame(predictors = I(test_set))
pred <- predict(fit, newdat)
#          [,1]      [,2]
#[1,] -2.905469  1.702384
#[2,]  1.871755 -1.236240
```
Note the I() when I use data.frame(). This is a must when we want to obtain a data frame of matrices. You can compare the difference between:
```
str(data.frame(response = I(response), predictors = I(predictors)))
#'data.frame':  10 obs. of  2 variables:
# $ response  : AsIs [1:10, 1:2] 1.262954.... -0.32623.... 1.329799.... 1.272429.... 0.414641.... ...
# $ predictors: AsIs [1:10, 1:3] -0.22426.... 0.377395.... 0.133336.... 0.804189.... -0.05710.... ...

str(data.frame(response = response, predictors = predictors))
#'data.frame':  10 obs. of  5 variables:
# $ response.1  : num  1.263 -0.326 1.33 1.272 0.415 ...
# $ response.2  : num  0.764 -0.799 -1.148 -0.289 -0.299 ...
# $ predictors.1: num  -0.2243 0.3774 0.1333 0.8042 -0.0571 ...
# $ predictors.2: num  -0.236 -0.543 -0.433 -0.649 0.727 ...
# $ predictors.3: num  1.758 0.561 -0.453 -0.832 -1.167 ...
```
Without I() to protect the matrix input, data are messy. It is amazing that this will not cause problem to lm, but predict.mlm will have a hard time obtaining the correct matrix for prediction, if you don't use I().

Well, I would recommend using a "list" instead of a "data frame" in this case. data argument in lm as well newdata argument in predict allows list input. A "list" is a more general structure than a data frame, which can hold any data structure without difficulty. We can do:
```
## still using previous matrices
training_list <- list(response = response, predictors = predictors)
fit <- lm(response ~ predictors, data = training_list)
newdat <- list(predictors = test_set)
pred <- predict(fit, newdat)
#          [,1]      [,2]
#[1,] -2.905469  1.702384
#[2,]  1.871755 -1.236240
```
Perhaps in the very end, I should stress that it is always safe to use formula interface, rather than matrix interface. I will use R built-in dataset trees as a reproducible example.
```
fit <- lm(cbind(Girth, Height) ~ Volume, data = trees)

## use the first two rows as prediction dataset
predict(fit, newdata = trees[1:2, ])
#     Girth   Height
#1 9.579568 71.39192
#2 9.579568 71.39192
```
Perhaps you still remember my saying that predict.mlm* is too primitive to support se.fit. This is the chance to test it.
```
predict(fit, newdata = trees[1:2, ], se.fit = TRUE)
#Error in predict.mlm(fit, newdata = trees[1:2, ], se.fit = TRUE) : 
#  the 'se.fit' argument is not yet implemented for "mlm" objects
```
Oops... How about confidence / prediction intervals (actually without the ability to compute standard error it is impossible to produce those intervals)? Well, predict.mlm* will just ignore it.
```
predict(fit, newdata = trees[1:2, ], interval = "confidence")
#     Girth   Height
#1 9.579568 71.39192
#2 9.579568 71.39192
```
So this is so different compared with predict.lm.
0 讨论(0)
发布评论:

提交评论
- 加载中...