I\'m benchmarking software which executes 4x faster on Intel 2670QM then my serial version using all 8 of my \'logical\' threads. I would like some community feedback on my
HT is called SMT (Simultaneous MultiThreading) or HTT (HyperThreading Technology) in most BIOSes. The efficiency of HT depends on the so called compute-to-fetch ratio that is how many in-core (or register/cache) operations your code does before it fetches from or stores to the slow main memory or I/O memory. For highly cache efficient and CPU-bound codes the HT gives almost no noticeable performance increase. For more memory bound codes the HT can really benefit the execution due to the so-called "latency hiding". That's why most non-x86 server CPUs provide 4 (e.g. IBM POWER7) to 8 (e.g. UltraSPARC T4) hardware threads per core. These CPUs are usually used in database and transactional processing systems where many concurrent memory-bound requests are serviced at once.
By the way, the Amdhal's law states that the upper limit of the parallel speedup is one over the serial fraction of the code. Usually the serial fraction increases with the number of processing elements if there is (possibly hidden in the runtime) communication or other synchronisation between the threads, although sometimes cache effects can lead to superlinear speedup and sometimes cache trashing can reduce performance drastically.