Why does optimx in R not give the correct solution to this simple nonparametric likelihood maximization?

只谈情不闲聊 提交于 2019-12-09 15:29:05

问题


Is optimx() providing incorrect solution or am I missing a simple point? Thank you!

I am trying to maximize a very simple likelihood. This is a non-parametric likelihood in the sense that the distribution of F is not specified parametrically. Rather, for each observed xi, f(xi)=pi and thus log(Likelihood)=Sum(log(f(xi)))=Sum(log(pi)).

The function I am trying to maximize is: sum(log(pi))+lamda(sum(pi-1)) where sum(pi)=1 (i.e. this is a constrained maximization problem which can be solved using Lagrange multiplier).

The answer to this problem is pi=1/n where n is the number of data points. However, optimx does not seem to give this solution. Does anybody have any idea. If n=2, the function I am maximizing is log(p1)+log(p2)+lamda(p1+p2-1).

Here is my code and output from R:

n=2
log.like=function(p)
{
  lamda=p[n+1]
  ll=0
  for(i in 1:n){
    temp = log(p[i])+lamda*p[i]-lamda/(n)
    ll=ll+temp
  }
  return(-ll)
}


mle = optimx(c(0.48,.52,-1.5),
             log.like,
             lower=c(rep(0.1,2),-3),
             upper=c(rep(.9,2),-1),
             method = "L-BFGS-B")

> mle
             par  fvalues   method fns grs itns conv  KKT1 KKT2 xtimes
1 0.9, 0.9, -1.0 1.010721 L-BFGS-B   8   8 NULL    0 FALSE   NA      0

The solution to the equation when n=2 is p1=p2=1/2 and lamda=-2. However, I do not get this when using optimx. Any idea?


回答1:


Nothing wrong with optimx. Take a step back and look at the function you want to maximize: log(p1) + log(p2) + lamda*(p1+p2-1). It's quite intuitive that the optimal solution is to make all variables as large as possible, no? See that optimx rightfully returned the upper bounds you specified.

So what is wrong with your approach? When using Lagrange multipliers, critical points are saddle points of your function above, and not local minima like optimx would help you find. So you need to modify your problem in a such a way that these saddle points become local minima. This can be done by optimizing over the norm of the gradient, which is easy to compute analytically for your problem. There is a great example (with pictures) here:

http://en.wikipedia.org/wiki/Lagrange_multiplier#Example:_numerical_optimization.

For your problem:

grad.norm <- function(x) {
  lambda <- tail(x, 1)
  p <- head(x, -1)
  h2 <- sum((1/p + lambda)^2) + (sum(p) - 1)^2
}

optimx(c(.48, .52, -1.5),
       grad.norm,
       lower = c(rep(.1, 2), -3),
       upper = c(rep(.9, 2), -1),
       method = "L-BFGS-B")

#                               par      fvalues   method  fns grs [...]
# 1 0.5000161, 0.5000161, -1.9999356 1.038786e-09 L-BFGS-B  13  13 [...]

Follow up: If you do not want to or cannot compute the gradient yourself, you can let R compute a numerical approximation, for example:

log.like <- function(x) {
  lambda <- tail(x, 1)
  p <- head(x, -1)
  return(sum(log(p)) + lambda*(sum(p) - 1))
}

grad.norm <- function(x) {
  require(numDeriv)
  return(sum(grad(log.like, x)^2))
}

optimx(c(.48, .52, -1.5),
       grad.norm,
       lower = c(rep(.1, 2), -3),
       upper = c(rep(.9, 2), -1),
       method = "L-BFGS-B")

#                                par      fvalues   method fns grs [...]
# 1 0.5000161, 0.5000161, -1.9999356 1.038784e-09 L-BFGS-B  13  13 [...]


来源:https://stackoverflow.com/questions/11876291/why-does-optimx-in-r-not-give-the-correct-solution-to-this-simple-nonparametric

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!