Is there a way to calculate c+= a*b in numpy without extra temporary memory allocation/copy? The way I understand it, to compute the expression above, numpy will us
c+= a*b