The primary idea behind HT/SMT was that when one thread stalls, another thread on the same core can co-opt the rest of that core\'s idle time and run with it, transparently.
As far as i know and as i experienced as a developer in the field of heavy throughput calculations SMT/HT has only one single usefull application and in all others at best it doesn't make things worse:
On virtualization SMT/HT helps reducing the costs of (thread) context switching and thus highly reduces the latency when working with multiple VMs sharing the same cores.
But regarding throughput, i never encountered in practice anything where SMT/HT actually didn't made things slower. Theoretically, it could be neither slower nor faster if the OS would optimally schedule the processes but in practice it happens to schedule two demanding processes on the same core due to SMT and thus slowing down the throughput.
So on all machines that are used for high performance calculations we disable HT and SMT. In all our tests they slow down calculation by around 10-20%.
If somebody has a real world (htoughput not latency) example where smt/HT actually didn't slow down things i would be very curious.