Why is matrix product slower when matrix has very small values?

问题

I create two matrices A and B of the same dimension. A contains larger values than B. The matrix multiplication A %*% A is about 10 times faster than B %*% B.

Why is this?

## disable openMP
library(RhpcBLASctl); blas_set_num_threads(1); omp_set_num_threads(1)

A <- exp(-as.matrix(dist(expand.grid(1:60, 1:60))))
summary(c(A))
#     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
# 0.000000 0.000000 0.000000 0.001738 0.000000 1.000000 

B <- exp(-as.matrix(dist(expand.grid(1:60, 1:60)))*10)
summary(c(B))
#      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
# 0.0000000 0.0000000 0.0000000 0.0002778 0.0000000 1.0000000 

identical(dim(A), dim(B))
## [1] TRUE

system.time(A %*% A)
#    user  system elapsed 
#   2.387   0.001   2.389 
system.time(B %*% B)
#    user  system elapsed 
#  21.285   0.020  21.310

sessionInfo()
# R version 3.6.1 (2019-07-05)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Linux Mint 19.2

# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
# LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

The question could be related to base::chol() slows down when matrix contains many small entries.

Edit: There are some small numbers, which seems to slow down computations. Others do not.

slow <-  6.41135533887904e-164
fast1 <- 6.41135533887904e-150
fast2 <- 6.41135533887904e-170

Mslow <- array(slow, c(1000, 1000)); system.time(Mslow %*% Mslow)
#   user  system elapsed 
# 10.165   0.000  10.168 

Mfast1 <- array(fast1, c(1000, 1000)); system.time(Mfast1 %*% Mfast1)
#   user  system elapsed 
#  0.058   0.000   0.057 

Mfast2 <- array(fast2, c(1000, 1000)); system.time(Mfast2 %*% Mfast2)
#   user  system elapsed 
#  0.056   0.000   0.055

回答1:

You most likely want to use .Machine$double.xmin instead of double.eps. This sets way less numbers to zero and has the same effect. To avoid subnormal numbers you might have to recompile BLAS using compiler flags that set those numbers to zero instead of raising a FP trap.

回答2:

Responses from the R-devel mailing list suggested that this could be a problem of denormal numbers or openBLAS could process small numbers slower.

From https://en.wikipedia.org/wiki/Denormal_number:

In computer science, denormal numbers or denormalized numbers (now often called subnormal numbers) fill the underflow gap around zero in floating-point arithmetic. Any non-zero number with magnitude smaller than the smallest normal number is "subnormal". [...] in extreme cases, instructions involving denormal operands may run as much as 100 times slower.

Indeed, B contains very small numbers:

sum(B<.Machine$double.eps)
[1] 12832980
sort(unique(B[B>0]))[10^(0:3)]
[1] 4.940656e-324 2.280607e-320 6.302966e-295 2.185410e-141

If small numbers are set to zero, the computation has the expected computation time:

C <- B; C[abs(C)<.Machine$double.eps] <- 0
system.time(C %*% C)
   user  system elapsed 
  2.266   0.032   2.298

Is there a way to automatically set values below .Machine$double.eps to zero? Checking every matrix for small numbers by hand seems not convenient.

来源：https://stackoverflow.com/questions/58886111/why-is-matrix-product-slower-when-matrix-has-very-small-values

标签

matrix

openblas