How to enable CUDA 7.0+ per-thread default stream in Visual Studio 2013?

微笑、不失礼 提交于 2020-06-28 07:42:11

问题


I followed the method provided in GPU Pro Tip: CUDA 7 Streams Simplify Concurrency and tested it in VS2013 with CUDA 7.5. While the multi-stream example worked, the multi-threading one did not give the expected result. The code is as below:

#include <pthread.h>
#include <cstdio>
#include <cmath>

#define CUDA_API_PER_THREAD_DEFAULT_STREAM

#include "cuda.h"

const int N = 1 << 20;

__global__ void kernel(float *x, int n)
{
    int tid = threadIdx.x + blockIdx.x * blockDim.x;
    for (int i = tid; i < n; i += blockDim.x * gridDim.x) {
        x[i] = sqrt(pow(3.14159, i));
    }
}

void *launch_kernel(void *dummy)
{
    float *data;
    cudaMalloc(&data, N * sizeof(float));

    kernel << <1, 64 >> >(data, N);

    cudaStreamSynchronize(0);

    return NULL;
}

int main()
{
    const int num_threads = 8;

    pthread_t threads[num_threads];

    for (int i = 0; i < num_threads; i++) {
        if (pthread_create(&threads[i], NULL, launch_kernel, 0)) {
            fprintf(stderr, "Error creating threadn");
            return 1;
        }
    }

    for (int i = 0; i < num_threads; i++) {
        if (pthread_join(threads[i], NULL)) {
            fprintf(stderr, "Error joining threadn");
            return 2;
        }
    }

    cudaDeviceReset();

    return 0;
}

I also tried to add the macro CUDA_API_PER_THREAD_DEFAULT_STREAM to CUDA C/C++->Host->Preprocessor Definitions, but the result was the same. The timeline generated by the Profiler is as below:

Do you have any idea on what happened here? Many thanks in advance.


回答1:


The code you have posted works for me as you would expect:

when compiled and run on a Linux system with CUDA 7.0 like so:

$ nvcc -arch=sm_30  --default-stream per-thread -o thread.out thread.cu

From that I can only assume that either you have a platform specific issue, or your build method is incorrect (note that --default-stream per-thread must be specified for every translation unit in the build).




回答2:


Updates: the concurrency may happen when I added a "cudaFree" as shown below. Is it because of the lack of synchronization?

void *launch_kernel(void *dummy)
{
    float *data;
    cudaMalloc(&data, N * sizeof(float));

    kernel << <1, 64 >> >(data, N);
    cudaFree(data); // Concurrency may happen when I add this line
    cudaStreamSynchronize(0);

    return NULL;
}

with the compilation like:

nvcc -arch=sm_30  --default-stream per-thread -lpthreadVC2 kernel.cu -o kernel.exe



来源:https://stackoverflow.com/questions/34259948/how-to-enable-cuda-7-0-per-thread-default-stream-in-visual-studio-2013

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!