How many threads (or work-item) can run at the same time?

后端未结

关注

 1  1105

逝去的感伤 2021-02-12 21:36

I\'m new in GPGPU programming and I\'m working with NVIDIA implementation of OpenCL.

My question was how to compute the limit of a GPU device (in number of threads).

1条回答

时光说笑 (楼主)

2021-02-12 21:52

The OpenCL standard does not specify how the abstract execution model provided by OpenCL is mapped to the hardware. You can enqueue any number T of threads (work items), and provide a workgroup size (WG), with at least the following constraints (see OpenCL spec 5.7.3 and 5.8 for details):

WG must divide T

WG must be at most DEVICE_MAX_WORK_GROUP_SIZE

WG must be at most KERNEL_WORK_GROUP_SIZE returned by GetKernelWorkGroupInfo ; it may be smaller than the device max workgroup size if the kernel consumes a lot of resources.

The implementation manages the execution of the kernel on the hardware. All threads of a single workgroup must be scheduled on a single "multiprocessor", but a single multiprocessor can manage several workgroups at the same time.

Threads inside a workgroup are executed by groups of 32 (NVIDIA warp) or 64 (AMD wavefront). Each micro-architecture does this in a different way. You will find more details in NVIDIA and AMD forums, and in the various docs provided by each vendor.

To answer your question: there is no limit to the number of threads. In the real world, your problem is limited by the size of inputs/outputs, i.e. the size of the device memory. To process a 4GB buffer of float, you can enqueue 1G threads, with WG=256 for example. The device will have to schedule 4M workgroups on its small number (say between 2 and 40) of multiprocessors.

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复