how could an assembly code be optimised to decrease the miss rate of the cache? I am aware that changing the placement policy/block size/block replacement policy has effects