问题
I have a very large multiply and sum operation that I need to implement as efficiently as possible. The best method I've found so far is bsxfun
in MATLAB, where I formulate the problem as:
L = 10000;
x = rand(4,1,L+1);
A_k = rand(4,4,L);
tic
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1)+sum(sum(bsxfun(@times,A_k(:,:,2:k),x(:,1,k+1-i)),2),3);
end
toc
Note that L
will be larger in practice. Is there a faster method? It's strange that I need to first add the singleton dimension to x
and then sum
over it, but I can't get it to work otherwise.
It's still much faster than any other method I've tried, but not enough for our application. I've heard rumors that the Python function numpy.einsum
may be more efficient, but I wanted to ask here first before I consider porting my code.
I'm using MATLAB R2017b.
回答1:
I believe both of your summations can be removed, but I only removed the easier one for the time being. The summation over the second dimension is trivial, since it only affects the A_k
array:
B_k = sum(A_k,2);
for k = 2:L
i = 2:k;
x(:,1,k+1) = x(:,1,k+1) + sum(bsxfun(@times,B_k(:,1,2:k),x(:,1,k+1-i)),3);
end
With this single change the runtime is reduced from ~8 seconds to ~2.5 seconds on my laptop.
The second summation could also be removed, by transforming times+sum into a matrix-vector product. It needs some singleton fiddling to get the dimensions right, but if you define an auxiliary array that is B_k
with the second dimension reversed, you can generate the remaining sum as ~x*C_k
with this auxiliary array C_k
, give or take a few calls to reshape
.
So after a closer look I realized that my original assessment was overly optimistic: you have multiplications in both dimensions in your remaining term, so it's not a simple matrix product. Anyway, we can rewrite that term to be the diagonal of a matrix product. This implies that we're computing a bunch of unnecessary matrix elements, but this still seems to be slightly faster than the bsxfun
approach, and we can get rid of your pesky singleton dimension too:
L = 10000;
x = rand(4,L+1);
A_k = rand(4,4,L);
B_k = squeeze(sum(A_k,2)).';
tic
for k = 2:L
ii = 1:k-1;
x(:,k+1) = x(:,k+1) + diag(x(:,ii)*B_k(k+1-ii,:));
end
toc
This runs in ~2.2 seconds on my laptop, somewhat faster than the ~2.5 seconds obtained previously.
回答2:
Since you're using an new version of Matlab you might try broadcasting / implicit expansion instead of bsxfun
:
x(:,1,k+1) = x(:,1,k+1)+sum(sum(A_k(:,:,2:k).*x(:,1,k-1:-1:1),3),2);
I also changed the order of summation and removed the i
variable for further improvement. On my machine, and with Matlab R2017b, this was about 25% faster for L = 10000
.
来源:https://stackoverflow.com/questions/46836480/is-matlabs-bsxfun-the-best-pythons-numpy-einsum