What is the algorithm to determine optimal work group size and number of workgroup

前端 未结 2 1021
天命终不由人
天命终不由人 2020-12-03 05:49

OpenCL standard defines the following options to get info about device and compiled kernel:

  • CL_DEVICE_MAX_COMPUTE_UNITS

  • CL_DEVICE_MAX_WORK_G

相关标签:
2条回答
  • 2020-12-03 06:09

    You discover these values experimentally for your algorithm. Use a profiler to get hard numbers.

    I like to use CL_DEVICE_MAX_COMPUTE_UNITS as the number of work groups, because I often rely on synchronizing work items. I usually run kernels with little branching, so the take the same time to execute in each compute unit.

    Some multiple of CL_KERNEL_PREFERRED_WORK_GROUP_SIZE_MULTIPLE will be optimal for your device. What that multiple actually is depends on your memory access pattern and type of work you are doing with each work item. Use 1 as the multiple when you are running a heavy, compute-bound (ALU) kernel. Try a larger multiple to hide memory latency if you are bottlenecked by memory access. Use a profiler to determine when your access time and your ALU time are optimal.

    Optimal ratio for ALU to fetch is 1:1 for any device. This is rarely achieved in practice, so you want to keep the ALU/SIMD banks saturated. This means ALU:fetch should be greater than 1 whenever possible. Less than 1 means you should try a larger work group size to better hide the memory latency.

    0 讨论(0)
  • 2020-12-03 06:09

    As mfa said, you have to discover these experimentally. I wanted to add that depending on what you are computing (particularly size of the jobs, i.e. smaller or larger for each work item), sometimes a good try can be:

    • Lots of work items with small work groups and each job item being small.
    • Less work items with larger work groups and each job item being larger.

    That is, basically check base cases and figure out how it affects the processing pipeline.

    In essence you have to tweak it. I often execute several times for different parameters (profile it) and then generate a surface plot to see how it behaves.

    0 讨论(0)
提交回复
热议问题