I am trying to perform linear regression, for a model like this:
Y = aX1 + bX2 + c
So, Y ~ X1 + X2
Suppose I have the following response vector:
set.seed(1)
Y <- runif(100, -1.0, 1.0)
And the following matrix of predictors:
X1 <- runif(100, 0.4, 1.0)
X2 <- sample(rep(0:1,each=50))
X <- cbind(X1, X2)
I want to use the following constraints on the coefficients:
a + c >= 0
c >= 0
So no constraint on b.
I know that the glmc package can be used to apply constraints, but I was not able to determine how to apply it for my constraints. I also know that contr.sum can be used so that all coefficients sum to 0, for example, but that is not what I want to do. solve.QP() seems like another possibility, where setting meq=0
can be used so that all coefficients are >=0 (again, not my goal here).
Note: The solution must be able to handle NA values in the response vector Y, for example with:
Y <- runif(100, -1.0, 1.0)
Y[c(2,5,17,56,37,56,34,78)] <- NA
solve.QP
can be passed arbitrary linear constraints, so it can certainly be used to model your constraints a+c >= 0
and c >= 0
.
First, we can add a column of 1's to X
to capture the intercept term, and then we can replicate standard linear regression with solve.QP
:
X2 <- cbind(X, 1)
library(quadprog)
solve.QP(t(X2) %*% X2, t(Y) %*% X2, matrix(0, 3, 0), c())$solution
# [1] 0.08614041 0.21433372 -0.13267403
With the sample data from the question, neither constraint is met using standard linear regression.
By modifying both the Amat
and bvec
parameters, we can add our two constraints:
solve.QP(t(X2) %*% X2, t(Y) %*% X2, cbind(c(1, 0, 1), c(0, 0, 1)), c(0, 0))$solution
# [1] 0.0000000 0.1422207 0.0000000
Subject to these constraints, the squared residuals are minimized by setting the a and c coefficients to both equal 0.
You can handle missing values in Y
or X2
just as the lm
function does, by removing the offending observations. You might do something like the following as a pre-processing step:
has.missing <- rowSums(is.na(cbind(Y, X2))) > 0
Y <- Y[!has.missing]
X2 <- X2[!has.missing,]
来源:https://stackoverflow.com/questions/45577591/linear-regression-with-constraints-on-the-coefficients