I have one searching OpenCL 1.1 algorithm which works well with small amount of data:
1.) build the inputData array and pass it to the GPU
2.) c
You should use OpenCL 2.0 and Pipes; they are perfect for this kind of problem.