For the NVIDIA GEFORCE 940mx GPU, Device Query shows it has 3 Multiprocessor and 128 cores for each MP.
Number of threads per multiprocessor=2048
It appears the main source of your confusion is mixing up two completely different sets of limits:
The numbers you quote (2048 threads per multiprocessor, three multiprocessors in total = 6144 threads represent the first set of limits. The numbers you show in your screenshot of the deviceQuery
output:
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
define the limits of a given kernel launch. While they overlap somewhat, you can treat them as more or less separate. For a more thorough discussion of the practicalities of kernel launch parameters and block dimensions, see here.