问题
The clang compiler includes CUDA header file host_defines.h in which the __shared__
is defined as __attribute__((shared))
. When CUDA source files are compiled to internal representation (IR) using clang, the __shared__
gets converted to addrspace(3)
. These address spaces can be observed in the clang file llvm/tools/clang/lib/Basic/Targets.cpp line number 1601 as an array
static const unsigned NVPTXAddrSpaceMap[] = {
1, // opencl_global
3, // opencl_local
4, // opencl_constant
// FIXME: generic has to be added to the target
0, // opencl_generic
1, // cuda_device
4, // cuda_constant
3, // cuda_shared
};
So the specific question is at which stage of conversion, the __attribute__((shared))
gets converted as addrspace(3)
. Looking at the parsing and lexing parts of clang gave no intimation about this. Can somebody please help ?
回答1:
The shared
attribute is defined in clang's Attr.td
file and is called CUDAShared
and represented as CUDASharedAttr
internally.
Lexing and parsing is done for all Attributes defined in Attr.td during the lexing and parsing stage for any Attrbiute. At this stage you won't find any necessary insights.
The first point where you will see valuable code for CUDASharedAttr
is located in clang/lib/Sema/SemaDeclAttr.cpp
. The Sema class builds the AST and in SemaDeclAttr.cpp
the handling for each Attribute is done.
For the particular CUDASharedAttr
handleSimpleAttribute<CUDASharedAttr>(S, D, Attr);
is called. This function just inserts the Attribute to a given declaration (Decl& D
).
Now that the Attribute is attached to a Decl you can query if the declaration has the attribute using: D.hasAttr<CUDASharedAttr>()
. This for example in SemaDecl.cpp
where the limitations on CUDA shared memory declarations are enforced and the storage class of the shared memory variable is set to static.
You will again find CUDASharedAttr bin clang/lib/CodeGen/CodeGenModule.cpp
where the actual LLVM IR is emitted.
In CodeGenModule.cpp you have the following function:
unsigned CodeGenModule::GetGlobalVarAddressSpace(const VarDecl *D,
unsigned AddrSpace) {
if (LangOpts.CUDA && LangOpts.CUDAIsDevice) {
if (D->hasAttr<CUDAConstantAttr>())
AddrSpace = getContext().getTargetAddressSpace(LangAS::cuda_constant);
else if (D->hasAttr<CUDASharedAttr>())
AddrSpace = getContext().getTargetAddressSpace(LangAS::cuda_shared);
else
AddrSpace = getContext().getTargetAddressSpace(LangAS::cuda_device);
}
return AddrSpace;
}
The function querys the address space for a shared function from the actual target, i.e. for the nvptx targets the address space map you posted is used:
static const unsigned NVPTXAddrSpaceMap[] = {
1, // opencl_global
3, // opencl_local
4, // opencl_constant
// FIXME: generic has to be added to the target
0, // opencl_generic
1, // cuda_device
4, // cuda_constant
3, // cuda_shared
};
LangAS::cuda_shared
corresponds to address space 3.
After all these steps you will get a global variable in with address space 3 in your final IR module like this:
; ModuleID = 'sm.cu'
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v32:32:32-v64:64:64-v128:128:128-n16:32:64"
target triple = "nvptx64-unknown-unknown"
@vec= internal unnamed_addr addrspace(3) global [32 x i32] zeroinitializer, align 4
; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.x() #0
; Function Attrs: nounwind readnone
declare i32 @llvm.nvvm.read.ptx.sreg.tid.y() #0
define ptx_kernel void @__pacxx_kernel0(i32 addrspace(1)* %tmp) {
%1 = tail call spir_func i32 @llvm.nvvm.read.ptx.sreg.tid.x() #1
%2 = zext i32 %1 to i64
%3 = getelementptr i32, i32 addrspace(1)* %tmp, i64 %2
%4 = load i32, i32 addrspace(1)* %3, align 4
%5 = getelementptr [32 x i32], [32 x i32] addrspace(3)* @vec, i64 0, i64 %2
store i32 %4, i32 addrspace(3)* %5, align 4
%6 = tail call spir_func i32 @llvm.nvvm.read.ptx.sreg.tid.y() #1
%7 = zext i32 %6 to i64
%8 = getelementptr [32 x i32], [32 x i32] addrspace(3)* @vec, i64 0, i64 %7
%9 = load i32, i32 addrspace(3)* %8, align 4
%10 = getelementptr i32, i32 addrspace(1)* %tmp, i64 %7
store i32 %9, i32 addrspace(1)* %10, align 4
ret void
}
来源:https://stackoverflow.com/questions/35356840/conversion-from-attribute-shared-to-addrspace3-in-clang-compiler-when