How many threads does it take to make them a bad choice?

前端未结

关注

 15  2575

猫巷女王i

I have to write a not-so-large program in C++, using boost::thread.

The problem at hand, is to process a large (maybe thousands or tens of thousands. Hundreds and millon

相关标签:

15条回答

情话喂你

2021-02-07 22:03

You said the files are all in one directory. Does that mean they are all on one physical drive?

If that is so, and assuming they are not already cached, then your job will be to keep the single read head busy, and no amount of threading will help it. In fact, if it has to hop between tracks due to parallelism, you could slow it down.

On the other hand, if the computation part takes significant time, causing the read head to have to wait, then it might make sense to have >1 thread.

Often, using threads for performance is missing the point unless it lets you get parallel pieces of hardware working at the same time.

More often, the value of threads is in, for example, keeping track of multiple simultaneous conversations, like if you have multiple users, where each thread can wait for its own Johny or Suzy and not get confused.

0 讨论(0)
发布评论:

提交评论
- 加载中...
花落未央

2021-02-07 22:04

How expensive the simplest thread is depends on the OS (you may also need to tune some OS parameters to get past a certain number of threads). At minimum each has its own CPU state (registers/flags incl. floating point) and stack as well as any thread-specific heap storage.

If each individual thread doesn't need too much distinct state, then you can probably get them pretty cheap by using a small stack size.

In the limit, you may end up needing to use a non-OS cooperative threading mechanism, or even multiplex events yourself using tiny "execution context' objects.

Just start with threads and worry about it later :)

0 讨论(0)
发布评论:

提交评论
- 加载中...
臣服心动

2021-02-07 22:05
To elaborate it really depends on
```
IO boundedness of the problem
    how big are the files
    how contiguous are the files
    in what order must they be processed
    can you determine the disk placement
how much concurrency you can get in the "global structure insert"
    can you "silo" the data structure with a consolidation wrapper
the actual CPU cost of the "global structure insert" 
```
For example if your files reside on a 3 terabyte flash memory array then the solution is different than if they reside on a single disk (where if the "global structure insert" takes less that the read the problem is I/O bounded and you might just as well have a 2 stage pipe with 2 threads - the read stage feeding the insert stage.)

But in both cases the architecture would probably be a vertical pipeline of 2 stages. n reading threads and m writing threads with n and m being determined by a "natural concurrency" for the stage.

Creating a thread per file will probably lead to disk thrashing. Just like you tailor the number of threads of a CPU bound process to the naturally achievable CPU concurrency (and going above that creates context switching overhead AKA thrashing) the same is true on the I/O side - in a sense you can think of the disk thrashing as "context switching on the disk".
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2021-02-07 22:05

Use a thread pool instead of creating a thread for each file. You can easily to adjust the number of threads once you write your solution. If the jobs are independed from each other, i'd say the number of threads should be equal to number of cores/cpus.

0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2021-02-07 22:10

I'm not too sure about HP/UX, but in the Windows world, we use thread pools to solve this sort of problem. Raymond Chen wrote about this a while back, in fact...

The skinny of it is that I would generally not expect anything to scale well on a CPU-bound load if the number of threads is more than about 2x the number of CPU cores you have in the system. For I/O bound loads, you might be able to get away with more, depending on how fast your disk subsystem is, but once you reach about 100 or so, I would seriously consider changing the model...

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉酒成梦

2021-02-07 22:14

There are two problems here, the first is your question about the ideal number ofthreads to use for processing this large number of files, the second is how to acheive the best performance.

Let's start with the second problem, to begin with I would not parallelize per file but I would parallelize the processing done on one file at a time. This would help significantly on multiple parts of your environment: - The hard drive as it does not have to seek out from one file to the n - 1 others - The operating system file cache will be warm with the data you will need on all your threads and you will not experience as much cache trashing.

I admit that the code to parallelize your application gets slightly more complex but the benefits you'll obtain are significant.

From this the answer to your question is easy, you should match at most one thread per core present in your system. This will allow you to be respectful of your caches and ultimately achieve the best performance possible on your system.

The ultimate point of course is that using this type of processing your application will be more respectful of your system as accessing n files simultaneously may make your OS unresponsive.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 下一页