Is global synchronization in OpenCL possible?

喜你入骨 提交于 2019-12-06 07:17:59

问题


As well known OpenCL barrier() function works only for single workgroup, and there is no direct possibility to synchronize workgroups. If it possible what's best approach for global synchronization today? Using atomics, OpenCL 2.0 features, etc.?

Github links, examples are welcome!

Thankx!


回答1:


Global syncronization within a kernel is not possible. This is because work groups are not gauranteed to be run at the same time. You can achieve a sort of global sync in the host application if you break your kernel into pieces. This is not suitable for many kernels, espeically if you use a lot of local memory or have a bit of initialization code before your kernel does any real work.

Break you kernel into two pars -- kernelA and kernelB for example. Global syncronization is simply a matter of running the NDRange for kernelA, then finish(), and NDRange for kernelB. The global data will remain in memory between the two calls.

Again, not pretty and not necessarily high performance, but if you really must have global sync, this is the only way to get it.




回答2:


While global synchronization has no succinct in-kernel API call, if the compute device supports the OpenCL extension cl_khr_global_int32_base_atomics, it may be implemented using atomics.

Please see Xiao et al.'s paper that evaluates lock and lock-free approaches to global synchronization on GPUs. http://synergy.cs.vt.edu/pubs/papers/xiao-ipdps2010-gpusync.pdf

This is mentioned in another stackoverflow post found here: OpenCL and GPU global synchronization



来源:https://stackoverflow.com/questions/30209996/is-global-synchronization-in-opencl-possible

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!