OpenCL speed and float point precision

半城伤御伤魂 提交于 2019-12-10 10:22:58

问题


I have just started working with OpenCL. However, I have found some weird behavior of OpenCl, which i can't understand. The source i built and tested, was http://www.codeproject.com/Articles/110685/Part-1-OpenCL-Portable-Parallelism . I have a ATI Radeon HD 4770, and a AMD Fx 6200 3.8 ghz 6 core cpu.

Speed

Firstly the speed is not linearly to the number of maximum work group items. I ran App profiler to analyze the time spent during the kernel execution. The result was a bit shocking, my GPU which can only handle 256 work items per group, used 2.23008 milliseconds to calculate square of 5079040 numbers. Note this was without considering the kernel loading time...

However, my cpu which can handle 1024 work items per group, used 13.41895 milliseconds to calculate the numbers. I thought that the work items in a work group are ran simultaneously, in other words the cpu should have been faster. What i want to know, do work group run simultaneously? Like, in my setup, the GPU would run more work groups simultaneously than the CPU.

Another factor may be that the GPU is faster to calculate float arithmetics, but my cpu have 4 times faster clock speed, so still weird. I know that normally the GPU should be faster when it yields to opencl, but i want a good explanation for why.

Edit: I tried to calculate 1024, 2048...5120 work items, and now the cpu was faster than the GPU. So i have learned that the CPU works better with few work times, while the GPU is best when it is many work items.

What i also saw, was that my CPU did the calculation much slower for every third times the work group size(4096, 6144, 8192). So it looks like my CPU takes three work groups simultaneously.

Floating point precision

Question moved here: OpenCL Floating point precision

Thanks in advance for all answers.


回答1:


What i want to know, do work group run simultaneously? Like, in my setup, the GPU would run more work groups simultaneously than the CPU.

There is a great answer to that question here: Are OpenCL work items executed in parallel?



来源:https://stackoverflow.com/questions/11170012/opencl-speed-and-float-point-precision

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!