I need to make a matrix/vector multiplication in Matlab of very large sizes: \"A\" is an 655360 by 5 real-valued matrix that are not necessarily sparse and \"B\" is a 655360
Your #1 option, if this is your bottleneck, is to re-examine your algorithm. See this question Optimizing MATLAB code for a great example of how choosing a different algorithm reduced runtime by three orders of magnitude.
I have had good results with matlab matrix multiplication using the GPU
Matlab is built using fairly optimized libraries (BLAS, etc.), so you can't easily improve upon it from within Matlab. Where you can improve is to get a better BLAS, such as one optimized for your processor - this will enable better use of the caches by getting appropriately sized blocks of data from main memory. Take a look into creating your own compiled versions of ATLAS, ACML, MKL, and Goto BLAS.
I wouldn't try to solve this one particular multiplication unless it's really killing you. Changing up the BLAS is likely to lead to a happier solution, especially if you're not currently making use of multicore processors.
In order to avoid the transpose operation, you could try:
sum(bsxfun(@times, A, B), 2)
But I would be astonished it was faster than the direct version. See @thiton's answer.
Also look at http://www.mathworks.co.uk/company/newsletters/news_notes/june07/patterns.html to see why the column-vector-based version is faster than the row-vector-based version.
Matlab's raison d'etre is doing matrix computations. I would be fairly surprised if you could significantly outperform its built-in matrix multiplication with hand-crafted tools. First of all, you should make sure your multiplication can actually be performed significantly faster. You could do this by implementing a similar multiplication in C++ with Eigen.