How many threads (or work-item) can run at the same time?

后端 未结 1 1103
逝去的感伤
逝去的感伤 2021-02-12 21:36

I\'m new in GPGPU programming and I\'m working with NVIDIA implementation of OpenCL.

My question was how to compute the limit of a GPU device (in number of threads).

相关标签:
1条回答
  • 2021-02-12 21:52

    The OpenCL standard does not specify how the abstract execution model provided by OpenCL is mapped to the hardware. You can enqueue any number T of threads (work items), and provide a workgroup size (WG), with at least the following constraints (see OpenCL spec 5.7.3 and 5.8 for details):

    • WG must divide T
    • WG must be at most DEVICE_MAX_WORK_GROUP_SIZE
    • WG must be at most KERNEL_WORK_GROUP_SIZE returned by GetKernelWorkGroupInfo ; it may be smaller than the device max workgroup size if the kernel consumes a lot of resources.

    The implementation manages the execution of the kernel on the hardware. All threads of a single workgroup must be scheduled on a single "multiprocessor", but a single multiprocessor can manage several workgroups at the same time.

    Threads inside a workgroup are executed by groups of 32 (NVIDIA warp) or 64 (AMD wavefront). Each micro-architecture does this in a different way. You will find more details in NVIDIA and AMD forums, and in the various docs provided by each vendor.

    To answer your question: there is no limit to the number of threads. In the real world, your problem is limited by the size of inputs/outputs, i.e. the size of the device memory. To process a 4GB buffer of float, you can enqueue 1G threads, with WG=256 for example. The device will have to schedule 4M workgroups on its small number (say between 2 and 40) of multiprocessors.

    0 讨论(0)
提交回复
热议问题