I have a hardware accelerated function that requires two instances of fixed point multiplication. My current solution (example below) requires two clock cycles per multiplicatio