OpenCL read variable size result buffer from the GPU

后端未结

关注

 4  632

感情败类

I have one searching OpenCL 1.1 algorithm which works well with small amount of data:

1.) build the inputData array and pass it to the GPU

2.) c

相关标签:

4条回答

予麋鹿

2021-01-28 09:22
I had a similiar problem regarding variable problem sizes. One way could be to simply implement a divide-and-conquer approach and to split up your data on the host. You could process your data blocks one after the other on the device.

BTW: you are sure about the comparison
```
while (lastPosition **>** RESULT_BUFFER_SIZE)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2021-01-28 09:25

Why not use addr = atomic_add(&addr_counter, 1); on a global variable, and use the returned address to write to a another global buffer buffer[addr*2] = X; buffer[addr*2+1] = Y;.

You can easily check when you run out of space if the returned address is bigger than the buffer size.

EDIT: What you want is to have parallel kernel execution and data access, that is not possible with OpenCL 1.1. You should go for OpenCL 2.0 that has that feature (SVM or pipes).

Keeping the kernels in a while loop checking for a variable, and not having a mechanism to empty (access the variable) from host side. Will make your kernel to deadlock, and crash your graphics.

If you want to stick to OpenCL 1.1, the only way is to run many small sized kernels, and then check the results. You can parallel launch more kernels while you do the processing of that data in the CPU.

0 讨论(0)
发布评论:

提交评论
- 加载中...
庸人自扰

2021-01-28 09:29

You have not clearly indicated that you are using Windows as OS but I assume it since you have the VS2013 tag in your question.

The Nvidia card does not crash. On Windows you have Timeout Detection & Recovery (TDR) in the WDDM driver which restarts GPU drivers if they become unresponsive. You can disable this "feature" with Nsight easily. However, be aware that this may cause problems with your desktop environment, so make sure to write a kernel that will end in a tolerable amount of time. Then you can run your very long kernels even on Windows with Nvidias OpenCL implementation.

0 讨论(0)
发布评论:

提交评论
- 加载中...
[愿得一人]

2021-01-28 09:33

You should use OpenCL 2.0 and Pipes; they are perfect for this kind of problem.

0 讨论(0)
发布评论:

提交评论
- 加载中...