I have been seeing in the literature for some of the newer CPU\'s such as the Intel Xeon \"Nehalem-EX\" as having 8 cores and 16 threads. What are they talking about here? I
It isn't hyper-threading renamed - it is hyper-threading (it is written on this webpage you gave link to).
Simply, processor tells OS that it has 16 cores, so it can balance tasks on doubled number of cores. Hyper-threading technology give some benefit becouse in some cases two different instructions from two different programs/threads can be executed on one core simultaneously. But for sure it will not give 200% speed up. I didn't work on such processor, but I think you can get about 10%-20% additional cpu time.
An extreme of a multi-threaded processor is the barrel processor. This is a form of SMT where the processor divides up slots between the multiple threads equally in a round robin manner. To do this, it only needs copies of the various registers while using the same set of execution units. So, in 4 clock cycles it would put code from Threads 0-3 in the pipeline.
You can think of the rest of these processors working in a similar fashion, to a more or lesser degree. Instead of distributing slots around equally, it may just use up slots that are empty due to either control or data hazards in the processor.
For example, when a branch is taken, instructions in the pipeline may need to be flushed. Instead of completely flushing everything, some of the slots can be used for other threads. The whole idea is to improve performance by not wasting CPU cycles.
That's how multiple threads work in hardware.
Yes, Nehalem-based processors implement Hyper-threading.
The new Nehalem-EX which you refer to has 8 physical cores where each core can be seen as 2 logical cores for a total of 16 logical cores, allowing for the execution of 16 application threads on a single processor.
This is the same technology used in the Hyper-threading-enabled Pentium 4 processors, and more recently, on the Atom processors. My Eee PC has a single-core Atom processor which has two logical cores -- the Windows Task Manager will show two CPU graphs; one for each logical core.
Sun's UltraSPARC T2 (and the T1) also allow for simultaneous multithreading (of which Intel's implementation is called Hyper-Threading -- an trademark of Intel) which allows a single core to appear as multiple logical cores to execute multiple threads on a single core.
The rough idea behind simultaneous multithreading is to have multiple registers to store the processor state, so it appears that there actually are multiple cores in a single core, because it has multiple full-sets of hardware registers.
While the physical facilities such as the ALU and FPU may not increase, having more sets of registers to run more threads on a physical core can lead to better utilization of the available processor resources. The core may have not been saturated when executing a single thread, but executing multiple could saturate all the units to its fullest potential.
So what does it mean for programmers?
It means that we still will need to write multi-threaded software -- having a program that only has a single thread will only be able to utilize a single logical core. Only by having well-written multi-threaded code are we able to take advantage of the massive number of logical cores these processors offer.
Even with simultaneous multithreading, the code is executed at one thread per logical core.
It is up to each operating system's threading model to map OS-level threads to hardware-level threads such as those described in the question.
The logical threads spawned by high-level programming languages used by applications programmers are still an OS-level removed from the hardware, unless of course you're talking about the OS code that does the mapping.
Hyperthreading (INTEL's trademark by the way) allows each thread to actually run simultaneously. So in this case you could run 8X2 application threads at the same time.
From the brochure ...
Intel Nehalem Architecture built on Intel's unique 45nm high-k metal gate technology process
Up to **8 cores** per processor
Up to **16 threads per processor** with Intel® Hyper-threading
2.3 billion transistors
Compare this to single-CPU, single core systems where each thread must be scheduled and at most only one thread will be active - that one running CPU bound task and the others waiting on an I/O transfer.
Originally threading was used either to model a set of concurrent activities (not model not actually run in parallel) or to produce the appearance of a system which was responsive even while doing I/O. For example without threading, your word-processor would appear to stall while saving a doc.
For many years I resisted the idea of having multiple threads in my desktop applications - it complicated the code and potentially reduced performance - think of all those mutex operations which require the OS kernel to get involved. With the advent of actually parallel execution of threads, my objections are reduced but I still believe that multiple processes rather than multiple threads in a single process is a better approach.
Chris