Sparse matrix-vector multiplication in CUDA
问题 I'm trying to implement a matrix-vector Multiplication on GPU (using CUDA). In my C++ code (CPU), I load the matrix as a dense matrix, and then I perform the matrix-vector multiplication using CUDA. I'm also using shared memory to improve the performance. How can I load the matrix in an efficient way, knowing that my matrix is a sparse matrix? Below is my C++ function to load the matrix: int readMatrix( char* filename, float* &matrix, unsigned int *dim = NULL, int majority = ROW_MAJOR ) {