I have following thrust::transform call.
my_functor *f_1 = new my_functor();
thrust::transform(data.begin(), data.end(), data.begin(),*f_1);
<
Use c++filt
command.
When I pass your example kernel names through c++filt, I get
void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, unsigned int, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, unsigned int, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array>)
void thrust::system::cuda::detail::detail::launch_closure_by_value<thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array> >(thrust::system::cuda::detail::for_each_n_detail::for_each_n_closure<thrust::zip_iterator<thrust::tuple<thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::detail::normal_iterator<thrust::device_ptr<int> >, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type> >, long, thrust::detail::device_unary_transform_functor<my_functor>, thrust::system::cuda::detail::detail::blocked_thread_array>)
thrust::detail::device_function<thrust::detail::device_unary_transform_functor<my_functor>, void>::device_function(thrust::detail::device_unary_transform_functor<my_functor> const&)
Command:
cuobjdump file.cu.o --dump-elf-symbols | grep STB_GLOBAL | tr -s " " | cut -d" " -f4,4 | c++filt
If you are using Visual Studio, use Nvidia NSIGHT Visual Studio Edition which comes with the CUDA Toolkit.
Go to the "Nsight" menu, click on the "Start Performance Analysis..." entry.
Then, click on "Launch" and wait for your application to be fully executed. You will see additional output in the console from Nsight.
After the execution, the "CUDA Source View" window will open. - Select "Source and PTX" in the "View" listbox You will be able to find the correspondance between source code and generated PTX. When you click on a line in the source code, one or more lines are highlighted in green in the PTX code.