MMX Register Speed vs Stack for Unsigned Integer Storage
问题 I am contemplating an implementation of SHA3 in pure assembly. SHA3 has an internal state of 17 64 bit unsigned integers, but because of the transformations it uses, the best case could be achieved if I had 44 such integers available in the registers. Plus one scratch register possibly. In such a case, I would be able to do the entire transform in the registers. But this is unrealistic, and optimisation is possible all the way down to even just a few registers. Still, more is potentially