问题
I create two matrices A
and B
of the same dimension. A
contains larger values than B
. The matrix multiplication A %*% A
is about 10 times faster than B %*% B
.
Why is this?
## disable openMP
library(RhpcBLASctl); blas_set_num_threads(1); omp_set_num_threads(1)
A <- exp(-as.matrix(dist(expand.grid(1:60, 1:60))))
summary(c(A))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.000000 0.000000 0.000000 0.001738 0.000000 1.000000
B <- exp(-as.matrix(dist(expand.grid(1:60, 1:60)))*10)
summary(c(B))
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# 0.0000000 0.0000000 0.0000000 0.0002778 0.0000000 1.0000000
identical(dim(A), dim(B))
## [1] TRUE
system.time(A %*% A)
# user system elapsed
# 2.387 0.001 2.389
system.time(B %*% B)
# user system elapsed
# 21.285 0.020 21.310
sessionInfo()
# R version 3.6.1 (2019-07-05)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Linux Mint 19.2
# Matrix products: default
# BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
# LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
The question could be related to base::chol() slows down when matrix contains many small entries.
Edit: There are some small numbers, which seems to slow down computations. Others do not.
slow <- 6.41135533887904e-164
fast1 <- 6.41135533887904e-150
fast2 <- 6.41135533887904e-170
Mslow <- array(slow, c(1000, 1000)); system.time(Mslow %*% Mslow)
# user system elapsed
# 10.165 0.000 10.168
Mfast1 <- array(fast1, c(1000, 1000)); system.time(Mfast1 %*% Mfast1)
# user system elapsed
# 0.058 0.000 0.057
Mfast2 <- array(fast2, c(1000, 1000)); system.time(Mfast2 %*% Mfast2)
# user system elapsed
# 0.056 0.000 0.055
回答1:
You most likely want to use .Machine$double.xmin
instead of double.eps
. This sets way less numbers to zero and has the same effect. To avoid subnormal numbers you might have to recompile BLAS using compiler flags that set those numbers to zero instead of raising a FP trap.
回答2:
Responses from the R-devel mailing list suggested that this could be a problem of denormal numbers or openBLAS could process small numbers slower.
From https://en.wikipedia.org/wiki/Denormal_number:
In computer science, denormal numbers or denormalized numbers (now often called subnormal numbers) fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest normal number is "subnormal". [...] in extreme cases, instructions involving denormal operands may run as much as 100 times slower.
Indeed, B
contains very small numbers:
sum(B<.Machine$double.eps)
[1] 12832980
sort(unique(B[B>0]))[10^(0:3)]
[1] 4.940656e-324 2.280607e-320 6.302966e-295 2.185410e-141
If small numbers are set to zero, the computation has the expected computation time:
C <- B; C[abs(C)<.Machine$double.eps] <- 0
system.time(C %*% C)
user system elapsed
2.266 0.032 2.298
Is there a way to automatically set values below .Machine$double.eps
to zero?
Checking every matrix for small numbers by hand seems not convenient.
来源:https://stackoverflow.com/questions/58886111/why-is-matrix-product-slower-when-matrix-has-very-small-values