I\'m looking to calculate highly parallelized trig functions (in block of like 1024), and I\'d like to take advantage of at least some of the parallelism that modern architectur