infinity as result in double operation

问题

I would understand why the result is infinity. I write the code below and I always receive inf as result. There is any precision problem with my code?

#include <stdio.h>
#include <stdlib.h>

#include "cuda.h"
#include "curand_kernel.h"

#define NDIM 30
#define NPAR 5

#define DIMPAR NDIM*NPAR

__device__ double uniform(int index){
    return (double) 0.767341;
}


__global__ void iteracao(double *pos){

    int thread = threadIdx.x + blockDim.x * blockIdx.x;
    double tvel;
    int i = 0;

    double l, r, t;

    if(thread < DIMPAR){
        do{
            t = (double) uniform(thread);
            l = (double) 2.05 * t * ( pos[thread] );
            r = (double) 2.05 * t * ( pos[thread] );
            tvel = (double) l+t+r;
            pos[thread] =  tvel;
            i++;
        }while(i < 10000);
    }

}


int main(int argc, char *argv[])
{

    double *d_pos,    *h_pos;


    h_pos = (double *) malloc(sizeof( double ) * DIMPAR);


    cudaMalloc((void**)&d_pos, DIMPAR   * sizeof( double ));


    int i, j, k, numthreadsperblock, numblocks;

    numthreadsperblock = 512;
    numblocks = (DIMPAR / numthreadsperblock) + ((DIMPAR % numthreadsperblock)?1:0);
    //
    printf("numthreadsperblock: %i;; numblocks:%i\n", numthreadsperblock, numblocks);

    cudaMemset(d_pos,  0.767341, DIMPAR   * sizeof( double ));
    iteracao<<<numblocks,numthreadsperblock>>>(d_pos);
    cudaMemcpy(h_pos, d_pos, DIMPAR * sizeof( double ), cudaMemcpyDeviceToHost);

    printf("\n");
    for(i = 0; i < NPAR; i++){
        for(j = i*NDIM, k = j; j < (k+30); j++){
            printf("%f,", h_pos[j]);
        }
        printf("***\n\n");
    }

    system("PAUSE");
    return 0;
}

the output is always this:

numthreadsperblock: 512;; numblocks:1

inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,inf,*

回答1:

You have 2 problems. The first is as described by @Anycorn in the comments. cudaMemset, just like memset expects a byte value and sets byte locations. You cannot use it to initialize float values.

The second is that your kernel has a loop that is operating 10000 times on each pos array element. In effect you are finding the 10000 factorial of a complicated expression. Since that expression is always positive, your answer blows up. In all probability your kernel is not written correctly. It is not doing what you want it to do. Even if you fix your first problem and properly initialize pos to zero, your calculation will still blow up.

The arithmetic you are performing is:

pos[idx] =  0.767341 + (3.1460981 * pos[idx]);

For each idx, you are performing the above operation 10000 times. Even for an initial pos[idx] value equal to zero, by the 2nd iteration of your loop, it will start to take off geometrically.

回答2:

You are init d_pos in a wrong way. cudaMemset() can only set memory byte by byte. see cudaMemset() doc for more details.

To init the array as you intended to, you could use Thrust as a express way.

thrust::fill(
    thrust::device_pointer_cast(d_pos),
    thrust::device_pointer_cast(d_pos) + DIMPAR,
    0.767341);

来源：https://stackoverflow.com/questions/18829412/infinity-as-result-in-double-operation

标签

cuda

double-precision