2D Texture from 2D array CUDA

前端 未结 2 609
难免孤独
难免孤独 2021-01-16 05:57

I am trying to pass an Nx3 array to a kernel and read from it as in texture memory and write to a second array. Here is my simplified code with N=8:

#include         


        
相关标签:
2条回答
  • 2021-01-16 06:23

    After brano's response and looking more into how pitch works, I'll answer my own question. Here is the modified code:

    #include <cstdio>
    #include <iostream>
    #include "handle.cu"
    
    using namespace std;
    
    texture<float,2,cudaReadModeElementType> tex_w;
    
    __global__ void kernel(int imax, float (*f)[3])
    {
      int i = threadIdx.x;
      int j = threadIdx.y;
      // width = 3, height = imax                                                                         
      // but we have imax threads in x, 3 in y                                                            
      // therefore height corresponds to x threads (i)                                                    
      // and width corresponds to y threads (j)                                                           
      if(i<imax)
        {
          // linear filtering looks between indices                                                       
          f[i][j] = tex2D(tex_w, j+0.5f, i+0.5f);
        }
    }
    
    void print_to_stdio(int imax, float (*w)[3])
    {
      for (int i=0; i<imax; i++)
        {
          printf("%2d  %3.3f  %3.3f  %3.3f\n",i, w[i][0], w[i][1], w[i][2]);
        }
      printf("\n");
    }
    
    int main(void)
    {
      int imax = 8;
      float (*w)[3];
      float (*d_f)[3], *d_w;
      dim3 grid(imax,3);
    
      w = (float (*)[3])malloc(imax*3*sizeof(float));
    
      for(int i=0; i<imax; i++)
        {
          for(int j=0; j<3; j++)
            {
              w[i][j] = i + 0.01f*j;
            }
        }
    
      print_to_stdio(imax, w);
    
      size_t pitch;
      HANDLE_ERROR( cudaMallocPitch((void**)&d_w, &pitch, 3*sizeof(float), imax) );
    
      HANDLE_ERROR( cudaMemcpy2D(d_w,             // device destination                                   
                                 pitch,           // device pitch (calculated above)                      
                                 w,               // src on host                                          
                                 3*sizeof(float), // pitch on src (no padding so just width of row)       
                                 3*sizeof(float), // width of data in bytes                               
                                 imax,            // height of data                                       
                                 cudaMemcpyHostToDevice) );
    
      HANDLE_ERROR( cudaBindTexture2D(NULL, tex_w, d_w, tex_w.channelDesc, 3, imax, pitch) );
    
      tex_w.normalized = false;  // don't use normalized values                                           
      tex_w.filterMode = cudaFilterModeLinear;
      tex_w.addressMode[0] = cudaAddressModeClamp; // don't wrap around indices                           
      tex_w.addressMode[1] = cudaAddressModeClamp;
    
      // d_f will have result array                                                                       
      cudaMalloc( &d_f, 3*imax*sizeof(float) );
    
      // just use threads for simplicity                                                                  
      kernel<<<1,grid>>>(imax, d_f);
    
      cudaMemcpy(w, d_f, 3*imax*sizeof(float), cudaMemcpyDeviceToHost);
    
      cudaUnbindTexture(tex_w);
      cudaFree(d_w);
      cudaFree(d_f);
    
      print_to_stdio(imax, w);
    
      free(w);
      return 0;
    }
    

    Instead of using memcpy() and having to deal with pitch on the host machine, using memcpy2D() accepts a pitch argument for both the device data and host data. Since we are using simply allocated data on the host, my understanding is that the pitch would simply be the row width, or 3*sizeof(float).

    0 讨论(0)
  • 2021-01-16 06:29

    I can give you a complete solution but then you might not learn :D , So here are some tips instead and maybe you could fix the rest on your own.

    Tip 1.
    When using cudaBindTexture2D it requests an offset and pitch. Both parameters have certain hardware dependent alignment restrictions. The offset is guaranteed to be 0 if you use cudaMalloc(..). The pitch is retrieved by using cudaMallocPitch(..). You also need to make sure that your host memory is pitched the same way otherwise your memcpy will not work as expected.

    Tip 2.
    Understand indexing in 2D. When accessing elements in W[i][j] you need to know that element W[i][j+1] is the next element in memory and NOT W[i+1][j].

    Tip 3.
    Use 1D arrays and calculate the 2D index yourself. This will give you better control.

    0 讨论(0)
提交回复
热议问题