问题
I have just started working with OpenCL. However, I have found some weird behavior of OpenCl, which i can't understand. The source i built and tested, was http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism . I have a ATI Radeon HD 4770, and a AMD Fx 6200 3.8 ghz 6 core cpu.
Speed
Firstly the speed is not linearly to the number of maximum work group items. I ran App profiler to analyze the time spent during the kernel execution. The result was a bit shocking, my GPU which can only handle 256 work items per group, used 2.23008 milliseconds to calculate square of 5079040 numbers. Note this was without considering the kernel loading time...
However, my cpu which can handle 1024 work items per group, used 13.41895 milliseconds to calculate the numbers. I thought that the work items in a work group are ran simultaneously, in other words the cpu should have been faster. What i want to know, do work group run simultaneously? Like, in my setup, the GPU would run more work groups simultaneously than the CPU.
Another factor may be that the GPU is faster to calculate float arithmetics, but my cpu have 4 times faster clock speed, so still weird. I know that normally the GPU should be faster when it yields to opencl, but i want a good explanation for why.
Edit: I tried to calculate 1024, 2048...5120 work items, and now the cpu was faster than the GPU. So i have learned that the CPU works better with few work times, while the GPU is best when it is many work items.
What i also saw, was that my CPU did the calculation much slower for every third times the work group size(4096, 6144, 8192). So it looks like my CPU takes three work groups simultaneously.
Floating point precision
Question moved here: OpenCL Floating point precision
Thanks in advance for all answers.
回答1:
What i want to know, do work group run simultaneously? Like, in my setup, the GPU would run more work groups simultaneously than the CPU.
There is a great answer to that question here: Are OpenCL work items executed in parallel?
来源:https://stackoverflow.com/questions/11170012/opencl-speed-and-float-point-precision