CUDA: Copy 1D array from GPU to 2D array on host

前端未结

关注

 1  1536

int main() {
    char** hMat,* dArr;

    hMat = new char*[10];
    for (int i=0;i<10;i++) {
        hMat[i] = new char[10];
    }
    cudaMalloc((void**)&dArr,10


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  陌清茗        
                
              
                            
                2021-01-26 03:58
              
            
            
                                                                       
Your allocation scheme (an array of pointers, separately allocated) has the potential to create a discontiguous allocation on the host.  There are no cudaMemcpy operations of any type (including the ones you mention) that can target an arbitrarily discontiguous area, which your allocation scheme has the potential to create.

In a nutshell, then, your approach is troublesome.  It can be made to work, but will require a loop to perform the copying -- essentially one cudaMemcpy operation per "row" of your "2D array".  If you choose to do that, presumably you don't need help.  It's quite straightforward.

What I will suggest is that you instead modify your host allocation to create an underlying contiguous allocation.  Such a region can be handled by a single, ordinary cudaMemcpy call, but you can still treat it as a "2D array" in host code.

The basic idea is to create a single allocation of the correct overall size, then to create a set of pointers to specific places within the single allocation, where each "row" should start.  You then reference into this pointer array using your initial double-pointer.

Something like this:

#include <stdio.h>

typedef char mytype;

int main(){

  const int rows = 10;
  const int cols = 10;

  mytype **hMat = new mytype*[rows];
  hMat[0] = new mytype[rows*cols];
  for (int i = 1; i < rows; i++) hMat[i] = hMat[i-1]+cols;

  //initialize "2D array"

  for (int i = 0; i < rows; i++)
    for (int j = 0; j < cols; j++)
      hMat[i][j] = 0;

  mytype *dArr;
  cudaMalloc(&dArr, rows*cols*sizeof(mytype));

  //copy to device
  cudaMemcpy(dArr, hMat[0], rows*cols*sizeof(mytype), cudaMemcpyHostToDevice);

  //kernel call


  //copy from device
  cudaMemcpy(hMat[0], dArr, rows*cols*sizeof(mytype), cudaMemcpyDeviceToHost);

  return 0;
}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复