I am trying to multiply 2 square matrices (32x32) using OpenCL with c++ host. I am trying to reproduce results from a book (OpenCL Programming By Example - R Banger, K Bhattacha