I\'ve a matrix in C++ filled with strings and I want to pass it to cuda kernel function. I know that CUDA can\'t handle strings so after some research I\'
Niether of the codes you've shown is complete, and the things you've left out may be important. You'll make it easier for others to help you if you show complete codes. Also, anytime you're struggling with CUDA codes, it's good practice to use proper cuda error checking which often times will point you at what is not working (I suspect this might have helped with your second attempt). Also, running your code with cuda-memcheck
is often times instructive.
In your first attempt, you've run into a classic problem with CUDA and nested pointers (a
is a pointer to an array of pointers). This problem occurs also pretty much any time there is a pointer buried in some other data structure. To copy such a data structure from host to device requires a "deep copy" operation, which has multiple steps. To understand more about this, search on "CUDA 2D array" (I consider the canonical answer to be the one given by talonmies here) or take a look at my answers here and here.
Also note that with CUDA 6, "deep copies" can be a lot easier conceptually for the programmer if you are able to use unified memory.
Your second attempt appears to be headed down a path of "flattening" your 2D or pointer-to-ponter array of char
. That's a typical solution to the "problem" of deep-copying, resulting in less code complexity and probably also higher performance. Here's a fully worked example, blending ideas from your first and second attempt, which seems to work for me:
$ cat t389.cu
#include <stdio.h>
__global__ void func(char* a, int *indexes, int num_strings){
for(int i=0;i<num_strings;i++){
printf("string[%d]: ", i);
for (int j=indexes[2*i]; j < indexes[2*i+1]; j++)
printf("%c", a[j]);
printf("\n");
}
}
int main(){
int max_text_length, num_str;
num_str = 3;
char *tmp[num_str];
max_text_length = 12;
tmp[0] = (char*) malloc(max_text_length*sizeof(char));
tmp[1] = (char*) malloc(max_text_length*sizeof(char));
tmp[2] = (char*) malloc(max_text_length*sizeof(char));
tmp[0] = "some text";
tmp[1] = "rand txt";
tmp[2] = "text";
int stridx[2*num_str];
int *d_stridx;
stridx[0] = 0;
stridx[1] = 9;
stridx[2] = 9;
stridx[3] = 17;
stridx[4] = 17;
stridx[5] = 21;
char *a, *d_a;
a = (char *)malloc(num_str*max_text_length*sizeof(char));
//flatten
int subidx = 0;
for(int i=0;i<num_str;i++)
{
for (int j=stridx[2*i]; j<stridx[2*i+1]; j++)
a[j] = tmp[i][subidx++];
subidx = 0;
}
cudaMalloc((void**)&d_a,num_str*max_text_length*sizeof(char));
cudaMemcpy(d_a, a,num_str*max_text_length*sizeof(char),cudaMemcpyHostToDevice);
cudaMalloc((void**)&d_stridx,num_str*2*sizeof(int));
cudaMemcpy(d_stridx, stridx,2*num_str*sizeof(int),cudaMemcpyHostToDevice);
func<<<1,1>>>(d_a, d_stridx, num_str);
cudaDeviceSynchronize();
}
$ nvcc -arch=sm_20 -o t389 t389.cu
$ cuda-memcheck ./t389
========= CUDA-MEMCHECK
string[0]: some text
string[1]: rand txt
string[2]: text
========= ERROR SUMMARY: 0 errors
$