I am just wondering if there are any usefuls tools out there that allow me to exploit the Instruction-Level-Parallelism in some algorithms. More specifically, I have a subset of
The previous answers are good. In addition, there is much to learn at Intel's site, and if you have a budget then Intel's tools would be worth looking at.
Intel's articles on Optimization