问题
I have a problem that requires me to do eigendecomposition and matrix multiplication of many (~4k) small (~3x3) square Hermitian matrices. In particular, I need each work item to perform eigendecomposition of one such matrix, and then perform two matrix multiplications. Thus, the work that each thread has to do is rather minimal, and the full job should be highly parallelizable.
Unfortunately, it seems all the available OpenCL LAPACKs are for delegating operations on large matrices to the GPU rather than for doing smaller linear algebra operations inside an OpenCL kernel. As I'd rather not implement matrix multiplcation and eigendecomposition for arbitrarily sized matrices in OpenCL myself, I was hoping someone here might know of a suitable library for the job?
I'm aware that OpenCL might be getting built-in matrix operations at some point since the matrix type is reserved, but that is not really of much use right now. There is a similar question here from 2011, but it pretty much just says to roll your own, so I'm hoping the situation has improved since then.
回答1:
In general, my experience with libraries like LAPACK, fftw, cuFFT, etc. is that when you want to do many really small problems like this, you are better off writing your own for performance. Those libraries are usually written for generality, so you can often beat their performance for specific small problems, especially if you can use unique properties of your particular problem.
I realize you don't want to hear "roll your own" but for this type of problem it is really the best thing to do IMO. You might find a library to do this, but considering the code that you really want (for performance) will not generalize, I doubt it exists. You'll be looking specifically for code to find the eigenvalues of 3x3 matrices. That's less of a library and more of a random code snippet with a suitable license that you can manipulate to take advantage of your specific problem.
In this specific case, you can find the eigenvalues of a 3x3 matrix with the textbook method using the characteristic polynomial. Remember that there is a relatively simple closed form solution for cubic equations: http://en.wikipedia.org/wiki/Cubic_function#General_formula_for_roots.
While I think it is very likely that this approach would be much faster than iterative methods, it would be wise to verify that if performance is an issue.
来源:https://stackoverflow.com/questions/21049037/performing-many-small-matrix-operations-in-parallel-in-opencl