I have one searching OpenCL 1.1 algorithm which works well with small amount of data:
1.) build the inputData array and pass it to the GPU
2.) c
Why not use addr = atomic_add(&addr_counter, 1);
on a global variable, and use the returned address to write to a another global buffer buffer[addr*2] = X; buffer[addr*2+1] = Y;
.
You can easily check when you run out of space if the returned address is bigger than the buffer size.
EDIT: What you want is to have parallel kernel execution and data access, that is not possible with OpenCL 1.1. You should go for OpenCL 2.0 that has that feature (SVM or pipes).
Keeping the kernels in a while loop checking for a variable, and not having a mechanism to empty (access the variable) from host side. Will make your kernel to deadlock, and crash your graphics.
If you want to stick to OpenCL 1.1, the only way is to run many small sized kernels, and then check the results. You can parallel launch more kernels while you do the processing of that data in the CPU.