问题
I m trying to run an application of vector addition, where i need to launch multiple kernels concurrently, so for concurrent kernel launch someone in my last question advised me to use multiple command queues. which i m defining by an array
context = clCreateContext(NULL, 1, &device_id, NULL, NULL, &err);
for(i=0;i<num_ker;++i)
{
queue[i] = clCreateCommandQueue(context, device_id, 0, &err);
}
I m getting an error "command terminated by signal 11" some where around the above code.
i m using for loop for launching kernels and En-queue data too
for(i=0;i<num_ker;++i)
{
err = clEnqueueNDRangeKernel(queue[i], kernel, 1, NULL, &globalSize, &localSize,
0, NULL, NULL);
}
The thing is I m not sure where m i going wrong, i saw somewhere that we can make array of command queues, so thats why i m using an array. another information, when i m not using A for loop, just manually defining multiple command queues, it works fine.
回答1:
I read as well your last question, and I think you should first rethink what do you really want to do and if OpenCL is really the way of doing it.
OpenCL is an API for masive parallel processing and data crunching. Where each kernel (or queued task) operates parallelly on many data values at the same time, therefore outperforming any serial CPU processing by many orders of magnitude.
The typical use case for OpenCL is 1 kernel running millions of work items. Were more advance applications may need multiple sequences of different kernels, and special syncronizations between CPU and GPU.
But concurrency is never a requirement. (Otherwise, a single core CPU would not be able to perform the task, and thats never the case. It will be slower, ok, but it will still be possible to run it)
Even if 2 tasks need to run at the same time. The time taken will be the same concurrently or not:
Not concurrent case:
Kernel 1: *
Kernel 2: -
GPU Core 1: *****-----
GPU Core 2: *****-----
GPU Core 3: *****-----
GPU Core 4: *****-----
Concurrent case:
Kernel 1: *
Kernel 2: -
GPU Core 1: **********
GPU Core 2: **********
GPU Core 3: ----------
GPU Core 4: ----------
In fact, the non concurrent case is preferred, since at least the first task is already completed and further processing can continue.
What you do want to do, as far as I understand, is run multiple kernels at the same time. So that the kernels run fully concurrently. For example, run 100 kernels (same kernel or different) and run them at the same time.
That does not fit the OpenCL model at all. And in fact in may be way slower than a CPU single thread.
If each kernel is independent to all the others, a core (SIMD or CPU) can only be allocated for 1 kernel at a time (because they only have 1 PC), even though they could run 1k threads at the same time. In an ideal scenario, this will convert your OpenCL device in a pool of few cores (6-10) that consume serially the kernels queued. And that is supposing the API supports it and the device as well, what is not always the case. In the worst case you will have a single device that runs a single kernel and is 99% wasted.
Examples of stuff that can be done in OpenCL:
- Data crunching/processing. Multiply vectors, simulate particles, etc..
- Image processing, border detection, filtering, etc.
- Video compresion, edition, generation
- Raytracing, complex light math, etc.
- Sorting
Examples of stuff that are not suitable for OpenCL:
- Atending async request (HTTP, trafic, interactive data)
- Procesing low amounts of data
- Procesing data that need completely different procesing for each type of it
From my point of view, the only real use case of using multiple kernels is the latter, and no matter what you do the performance will be horrible in that case. Better use a multithread pool instead.
来源:https://stackoverflow.com/questions/26287851/opencl-multiple-command-queue-for-concurrent-ndkernal-launch