Multithreading: What is the point of more threads than cores?

后端未结

关注

 17  881

I thought the point of a multi-core computer is that it could run multiple threads simultaneously. In that case, if you have a quad-core machine, what\'s the point of having

相关标签:

17条回答

后悔当初

2020-11-29 17:10

The ideal usage of threads is, indeed, one per core.

However, unless you exclusively use asynchronous/non-blocking IO, there's a good chance that you will have threads blocked on IO at some point, which will not use your CPU.

Also, typical programming languages make it somewhat difficult to use 1 thread per CPU. Languages designed around concurrency (such as Erlang) can make it easier to not use extra threads.

0 讨论(0)
发布评论:

提交评论
- 加载中...
梦如初夏

2020-11-29 17:12

The point is that, despite not getting any real speedup when thread count exceeds core count, you can use threads to disentangle pieces of logic that should not have to be interdependent.

In even a moderately complex application, using a single thread try to do everything quickly makes hash of the 'flow' of your code. The single thread spends most of its time polling this, checking on that, conditionally calling routines as needed, and it becomes hard to see anything but a morass of minutiae.

Contrast this with the case where you can dedicate threads to tasks so that, looking at any individual thread, you can see what that thread is doing. For instance, one thread might block waiting on input from a socket, parse the stream into messages, filter messages, and when a valid message comes along, pass it off to some other worker thread. The worker thread can work on inputs from a number of other sources. The code for each of these will exhibit a clean, purposeful flow, without having to make explicit checks that there isn't something else to do.

Partitioning the work this way allows your application to rely on the operating system to schedule what to do next with the cpu, so you don't have to make explicit conditional checks everywhere in your application about what might block and what's ready to process.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一整个雨季

2020-11-29 17:14

A processor, or CPU, is the physical chip that is plugged into the system. A processor can have multiple cores (a core is the part of the chip that is capable of executing instructions). A core can appear to the operating system as multiple virtual processors if it is capable of simultaneously executing multiple threads (a thread is a single sequence of instructions).

A process is another name for an application. Generally, processes are independent of each other. If one process dies, it does not cause another process to also die. It is possible for processes to communicate, or share resources such as memory or I/O.

Each process has a separate address space and stack. A process can contain multiple threads, each able to execute instructions simultaneously. All the threads in a process share the same address space, but each thread will have its own stack.

Hopefully with these definitions and further research using these fundamentals will help your understanding.

0 讨论(0)
发布评论:

提交评论
- 加载中...
南旧

2020-11-29 17:16

In response to your first conjecture: multi-core machines can simultaneously run multiple processes, not just the multiple threads of a single process.

In response to your first question: the point of multiple threads is usually to simultaneously perform multiple tasks within one application. The classic examples on the net are an email program sending and receiving mail, and a web server receiving and sending page requests. (Note that it's essentially impossible to reduce a system like Windows to running only one thread or even only one process. Run the Windows Task Manager and you'll typically see a long list of active processes, many of which will be running multiple threads.)

In response to your second question: most processes/threads are not CPU-bound (ie, not running continuously and uninterrupted), but instead stop and wait frequently for I/O to finish. During that wait, other processes/threads can run without "stealing" from the waiting code (even on a single core machine).

0 讨论(0)
发布评论:

提交评论
- 加载中...
抹茶落季

2020-11-29 17:18

If a thread is waiting for a resource (such as loading a value from RAM into a register, disk I/O, network access, launch a new process, query a database, or wait for user input), the processor can work on a different thread, and return to the first thread once the resource is available. This reduces the time the CPU spends idle, as the CPU can perform millions of operations instead of sitting idle.

Consider a thread that needs to read data off a hard drive. In 2014, a typical processor core operates at 2.5 GHz and may be able to execute 4 instructions per cycle. With a cycle time of 0.4 ns, the processor can execute 10 instructions per nanosecond. With typical mechanical hard drive seek times are around 10 milliseconds, the processor is capable of executing 100 million instructions in the time it takes to read a value from the hard drive. There may be significant performance improvements with hard drives with a small cache (4 MB buffer) and hybrid drives with a few GB of storage, as data latency for sequential reads or reads from the hybrid section may be several orders of magnitude faster.

A processor core can switch between threads (cost for pausing and resuming a thread is around 100 clock cycles) while the first thread waits for a high latency input (anything more expensive than registers (1 clock) and RAM (5 nanoseconds)) These include disk I/O, network access (latency of 250ms), reading data off a CD or a slow bus, or a database call. Having more threads than cores means useful work can be done while high-latency tasks are resolved.

The CPU has a thread scheduler that assigns priority to each thread, and allows a thread to sleep, then resume after a predetermined time. It is the thread scheduler's job to reduce thrashing, which would occur if each thread executed just 100 instructions before being put to sleep again. The overhead of switching threads would reduce the total useful throughput of the processor core.

For this reason, you may want to break up your problem in to a reasonable number of threads. If you were writing code to perform matrix multiplication, creating one thread per cell in the output matrix might be excessive, whereas one thread per row or per n rows in the output matrix might reduce the overhead cost of creating, pausing, and resuming threads.

This is also why branch prediction is important. If you have an if statement that requires loading a value from RAM but the body of the if and else statements use values already loaded into registers, the processor may execute one or both branches before the condition has been evaluated. Once the condition returns, the processor will apply the result of the corresponding branch and discard the other. Performing potentially useless work here is probably better than switching to a different thread, which could lead to thrashing.

As we have moved away from high clock-speed single-core processors to multi-core processors, chip design has focused on cramming more cores per die, improving on-chip resource sharing between cores, better branch prediction algorithms, better thread switching overhead, and better thread scheduling.

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3