问题
This is a follow up question to the selected answer in this post: Output of cuda program is not what was expected.
While the below functions works:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord[0] = 'H';
myWord[1] = 'e';
myWord[2] = 'l';
myWord[3] = 'l';
myWord[4] = 'o';
}
Why does not this work?
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord = "Hello\0";
}
回答1:
You should start paying much more attention to the output from the compiler. Your second kernel code:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
myWord = "Hello\0";
}
compiles to a null kernel with nothing inside it:
$ nvcc -arch=sm_20 -c nullkernel.cu
nullkernel.cu(3): warning: variable "myWord" was set but never used
nullkernel.cu(3): warning: variable "myWord" was set but never used
The reason why is because what you think is a string copy assignment is really just a pointer assignment, and in this case the compiler is smart enough to know that myWord isn't written to memory, so it just eliminates all the code and warns you that myWord isn't used.
If I were to ask a rhetorical question and re-write the code this way:
__global__ void setVal(char **word)
{
char *myWord = word[(blockIdx.y * gridDim.x) + blockIdx.x];
const char[] mymsg = "Hello\0";
myWord = mymsg;
}
would be more obvious both why the code doesn't compile and why it could never "implicitly" perform a string copy assignment even if it did compile?
回答2:
In your second version, myWord = "Hello\0";
, the "Hello\0"
is not stored in the space given by the **word
parameter. The string is stored probably in the .rodata
section of the executable. The assignment simply updates the myWord
pointer -- it does NOT do any bulk copying of data. (Though as talonmies points out, the compiler can figure out that the pointer update isn't needed at all, and optimizes away the entire function. Neat.)
In general, C doesn't provide any easy bulk-data copy mechanisms built into the language -- the designers thought expensive things should look expensive. So, while PL/I makes assigning 0
to every element in a multidimensional array a very easy operation: A = 0;
, C forces nested for()
loops with memset()
operations in the inner-most loop, to drive home the idea that it is expensive.
(Copying struct
elements into a function parameter is the only exception to the bulk-copy rule.)
来源:https://stackoverflow.com/questions/6566910/cuda-char-variable-assignment