integrate() gives horribly wrong answer:
integrate(function (x) dnorm(x, -5, 0.07), -Inf, Inf, subdivisions = 10000L)
# 2.127372e-23 with absolute error <
Interesting workaround: not too surprisingly, integrate
does well when the values sampled (on (-Inf,Inf)
, no less) are closer to the "center" of the data. You can reduce this by using your function but hinting at a center:
Without adjustment:
t(sapply(-10:10, function(i) integrate(function (x) dnorm(x, i, 0.07), -Inf, Inf, subdivisions = 10000L)))
# value abs.error subdivisions message call
# [1,] 0 0 1 "OK" Expression
# [2,] 1 4.611403e-05 10 "OK" Expression
# [3,] 6.619713e-19 1.212066e-18 2 "OK" Expression
# [4,] 7.344551e-71 0 2 "OK" Expression
# [5,] 3.389557e-06 6.086176e-06 3 "OK" Expression
# [6,] 2.127372e-23 3.849798e-23 2 "OK" Expression
# [7,] 1 3.483439e-05 8 "OK" Expression
# [8,] 1 6.338078e-07 11 "OK" Expression
# [9,] 1 3.408389e-06 7 "OK" Expression
# [10,] 1 6.414833e-07 8 "OK" Expression
# [11,] 1 7.578907e-06 3 "OK" Expression
# [12,] 1 6.414833e-07 8 "OK" Expression
# [13,] 1 3.408389e-06 7 "OK" Expression
# [14,] 1 6.338078e-07 11 "OK" Expression
# [15,] 1 3.483439e-05 8 "OK" Expression
# [16,] 2.127372e-23 3.849798e-23 2 "OK" Expression
# [17,] 3.389557e-06 6.086176e-06 3 "OK" Expression
# [18,] 7.344551e-71 0 2 "OK" Expression
# [19,] 6.619713e-19 1.212066e-18 2 "OK" Expression
# [20,] 1 4.611403e-05 10 "OK" Expression
# [21,] 0 0 1 "OK" Expression
If we add a "centering" hint, though, we get more consistent results:
t(sapply(-10:10, function(i) integrate(function (x, offset) dnorm(x + offset, i, 0.07), -Inf, Inf, subdivisions = 10000L, offset = i)))
# value abs.error subdivisions message call
# [1,] 1 7.578907e-06 3 "OK" Expression
# [2,] 1 7.578907e-06 3 "OK" Expression
# [3,] 1 7.578907e-06 3 "OK" Expression
# [4,] 1 7.578907e-06 3 "OK" Expression
# [5,] 1 7.578907e-06 3 "OK" Expression
# [6,] 1 7.578907e-06 3 "OK" Expression
# [7,] 1 7.578907e-06 3 "OK" Expression
# [8,] 1 7.578907e-06 3 "OK" Expression
# [9,] 1 7.578907e-06 3 "OK" Expression
# [10,] 1 7.578907e-06 3 "OK" Expression
# [11,] 1 7.578907e-06 3 "OK" Expression
# [12,] 1 7.578907e-06 3 "OK" Expression
# [13,] 1 7.578907e-06 3 "OK" Expression
# [14,] 1 7.578907e-06 3 "OK" Expression
# [15,] 1 7.578907e-06 3 "OK" Expression
# [16,] 1 7.578907e-06 3 "OK" Expression
# [17,] 1 7.578907e-06 3 "OK" Expression
# [18,] 1 7.578907e-06 3 "OK" Expression
# [19,] 1 7.578907e-06 3 "OK" Expression
# [20,] 1 7.578907e-06 3 "OK" Expression
# [21,] 1 7.578907e-06 3 "OK" Expression
I recognize this is mitigation for heuristics, presumes knowing something about your distribution before integration, and is not a perfect "generic" solution. Just offering another perspective.
Try package cubature
.
library(cubature)
hcubature(function (x) dnorm(x, -5, 0.07), -Inf, Inf)
#$integral
#[1] 1
#
#$error
#[1] 9.963875e-06
#
#$functionEvaluations
#[1] 405
#
#$returnCode
#[1] 0
Note that function pcubature
in the same package also returns 0.
From vignette("cubature")
, section Introduction. My emphasis.
This R
cubature
package exposes both the hcubature and pcubature routines of the underlying C cubature library, including the vectorized interfaces.Per the documentation, use of
pcubature
is advisable only for smooth integrands in dimensions up to three at most. In fact, thepcubature
routines perform significantly worse than the vectorizedhcubature
in inappropriate cases. So when in doubt, you are better off usinghcubature
.
Since in this case the integrand is the normal density, a smooth and 1-dimensional function, there would be reasons to prefer pcubature
. But it doesn't give the right result. The vignette concludes the following.
Vectorized
hcubature
seems to be a good starting point.For smooth integrands in low dimensions (≤3),
pcubature
might be worth trying out. Experiment before using in a production package.
Expanding a little further on @r2evan's and @Limey's comments:
@Limey: for very general problems like this, there is simply no way to guarantee a generic solution.
One way to solve such problem is to use more knowledge of the properties of the integrand (@r2evans's answer); the answer referenced by @Limey goes into detail for a different problem.
One "gotcha" that you may not have thought of is that trying out a bunch of generic methods, tuning settings, etc. may mislead you into concluding that some settings/methods are generically better than the first one you tried that failed to get the right answer. (Methods that work may work better because they're generically better, but trying them on one example doesn't prove it!)
As an example, the description of pcubature()
(in ?cubature::pcubature
says
This algorithm is often superior to h-adaptive integration for smooth integrands in a few (<=3) dimensions, but is a poor choice in higher dimensions or for non-smooth integrands.
However, recall that pcubature()
happens to fail for your example, which is a smooth low-dimensional case - exactly where pcubature()
is supposed to perform better - which suggests that it may be just luck that hcubature()
works and pcubature()
doesn't in this case.
An illustration of how sensitive the results can be to parameters (lower/upper limits in this case):
library(emdbook)
cc <- curve3d(integrate( dnorm, mean=-5, sd=0.07,
lower=x, upper=y, subdivisions=1000L)$value,
xlim=c(-30,-10), ylim=c(0,30), n = c(61, 61),
sys3d="image", col=c("black", "white"),
xlab="lower", ylab="upper")
White squares are successful (integral=1), black squares are bad (integral=0).
From the online documentation: "Like all numerical integration routines, these evaluate the function on a finite set of points. If the function is approximately constant (in particular, zero) over nearly all its range it is possible that the result and error estimate may be seriously wrong."
I take this to mean "caveat emptor". I notice that in your example, the absolute error is greater than value of the integral. Given that you know f(x) > 0 for all x, at least it's giving you the chance to spot that something has gone wrong. It's down to you to take the opportunity.
integrate( function(x) dnorm(x, -5, 0.07), -20, 10, subdivisions=1000L)
Gives
1 with absolute error < 9.8e-07
The warning in the online doc says to me that, given your apparent definition of buggy, the answer to your question is "no, there is no reliable numerical intergration method. Not in R or any other language". No numerical integration technique should be used blindly. The user needs to check their inputs are sensible and the output is reasonable. It's no good believing an answer just because the computer gave it to you.
See also this post.