Performing many small matrix operations in parallel in OpenCL
问题 I have a problem that requires me to do eigendecomposition and matrix multiplication of many (~4k) small (~3x3) square Hermitian matrices. In particular, I need each work item to perform eigendecomposition of one such matrix, and then perform two matrix multiplications. Thus, the work that each thread has to do is rather minimal, and the full job should be highly parallelizable. Unfortunately, it seems all the available OpenCL LAPACKs are for delegating operations on large matrices to the GPU