Floating multiplication performing slower depending of operands in C

亡梦爱人 提交于 2019-12-03 10:12:27

The problem has already mostly been diagnosed, but I will write up exactly what happens here.

Essentially, the questioner is modeling diffusion; an initial quantity on the boundary diffuses into the entirety of a large grid. At each time step t, the value at the leading edge of the diffusion will be 0.2^t (ignoring effects at the corners).

The smallest normalized single-precision value is 2^-126; when cur_time = 55, the value at the frontier of the diffusion is 0.2^55, which is a bit smaller than 2^-127. From this time step forward, some of the cells in the grid will contain denormal values. On the questioner's Nehalem, operations on denormal data are about 100 times slower than the same operation on normalized floating point data, explaining the slowdown.

When the grid is initially filled with constant data of 1.0, the data never gets too small, and so the denormal stall is avoided.

Note that changing the data type to double would delay, but not alleviate the issue. If double precision is used for the computation, denormal values (now smaller than 2^-1022) will first arise in the 441st iteration.

At the cost of precision at the leading edge of the diffusion, you could fix the slowdown by enabling "Flush to Zero", which causes the processor to produce zero instead of denormal results in arithmetic operations. This is done by toggling a bit in the FPSCR or MXSCR, preferably via the functions defined in the <fenv.h> header in the C library.

Another (hackier, less good) "fix" would be to fill the matrix initially with very small non-zero values (0x1.0p-126f, the smallest normal number). This would also prevent denormals from arising in the computation.

Maybe your ZeroMatrix uses the typical storage scheme for Sparse Matrices: store every non-zero value in a linked list. If that is the case, it is quite understandable why it performs worse than a typical array-based storage-scheme: because it needs to run thru the linked list once for every operation you perform. In that case you can maybe speed the process up by using a matrix-multiply-algorithm that accounts for having a sparse-matrix. If this is not the case please post minimal but complete code so we can play with it.

here is one of the possibilities for multiplying sparse matrices efficiently:

http://www.cs.cmu.edu/~scandal/cacm/node9.html

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!