问题
As written in Nvidia's Inline PTX Assembly document, the grammar for using inline assembly is:
asm("temp_string" : "constraint"(output) : "constraint"(input));
Here are two examples:asm("vadd.s32.s32.s32 %0, %1.h0, %2.h0;" : "=r"(v) : "r"(a), "r"(b));
asm("vadd.u32.u32.u32 %0.b0, %1, %2, %3;" : "=r"(v) : "r"(a), "r"(b), "r"(z));
In both examples, there are parameters such as:h0
or b0
follow the %n
. I looked through CUDA's official document and didn't find anything concerns about the meaning of h0
or b0
. I've seen h0
,h1
and b0
,b1
,b2
,b3
. I guess h0
or h1
represents a 16bit value, while bn
represents a byte value. Does someone know the exact meaning of these?
Thanks for the help from Roger Dahl. I read the PTX ISA 3.0 and found the answer.
"h" means half-word. h0
means the low half-word of a 32bit word. h1
means the high half-word of a 32bit word. "b" means an integer byte. b0
,b1
,b2
and b3
represent the first 8bit, second 8bit, third 8bit and highest 8bit of a 32bit word.
回答1:
vadd
is one of the video specific instructions that are included with PTX. A description of the complete PTX ISA is included with the CUDA distribution. On my machine, it's in C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v4.1\doc\ptx_isa_3.0.pdf
. The description of the h0
, h1
, b0
, etc, designators are in the 8.7.11 Video Instructions
section. They represent different implicit shift/mask operations (see the optMerge
function).
来源:https://stackoverflow.com/questions/11546221/syntax-on-inline-ptx-code-for-cuda