Is it possible to query the number of processing elements (per compute unit) in OpenCL? If yes, how? I did not find a corresponding parameter on the clGetDeviceInfo doc pa
Processing element (PE) is the standard terminology and no you cannot query the number.
Now I see some reasons why it's not possible:
The definition itself:
PE: A virtual scalar processor. A work-item may execute on one or more processing elements.
So depending on the architecture the number that would be returned would be more or less meaningless. I think for instance to the previous architecture of AMD GPUs which used VLIW processors.
PE is an abstraction that is most useful in the standard to illustrate/define some concepts see for instance the definitions given to SIMD, SPMD and of course the Platform Model. But this concept is not used in practice (though very useful to know by the developer to achieve good performance). You will care instead about the max number of work-items in a work-group.
Even within an given architecture the processing elements are of different types. For example if we take the GK110 Kepler Architecture an SMx (the equivalent of the Compute Unit) has 192 SP CUDA cores, 64 DP units, 32 special function units (SFU). So what should be the returned number of a query asking for the number of PE?