I have one searching OpenCL 1.1 algorithm which works well with small amount of data:
1.) build the inputData array and pass it to the GPU
2.) c
I had a similiar problem regarding variable problem sizes. One way could be to simply implement a divide-and-conquer approach and to split up your data on the host. You could process your data blocks one after the other on the device.
BTW: you are sure about the comparison
while (lastPosition **>** RESULT_BUFFER_SIZE)
Why not use addr = atomic_add(&addr_counter, 1);
on a global variable, and use the returned address to write to a another global buffer buffer[addr*2] = X; buffer[addr*2+1] = Y;
.
You can easily check when you run out of space if the returned address is bigger than the buffer size.
EDIT: What you want is to have parallel kernel execution and data access, that is not possible with OpenCL 1.1. You should go for OpenCL 2.0 that has that feature (SVM or pipes).
Keeping the kernels in a while loop checking for a variable, and not having a mechanism to empty (access the variable) from host side. Will make your kernel to deadlock, and crash your graphics.
If you want to stick to OpenCL 1.1, the only way is to run many small sized kernels, and then check the results. You can parallel launch more kernels while you do the processing of that data in the CPU.
You have not clearly indicated that you are using Windows as OS but I assume it since you have the VS2013 tag in your question.
The Nvidia card does not crash. On Windows you have Timeout Detection & Recovery (TDR) in the WDDM driver which restarts GPU drivers if they become unresponsive. You can disable this "feature" with Nsight easily. However, be aware that this may cause problems with your desktop environment, so make sure to write a kernel that will end in a tolerable amount of time. Then you can run your very long kernels even on Windows with Nvidias OpenCL implementation.
You should use OpenCL 2.0 and Pipes; they are perfect for this kind of problem.