Will multi threading provide any performance boost?

前端未结

关注

 19  850

说谎

I am new to programming in general so please keep that in mind when you answer my question.

I have a program that takes a large 3D array (1 billion elements) and sums up

相关标签:

19条回答

误落风尘

2021-02-05 13:55

I guess if you are just dealing with bits you might not have to page or use a swap file and in that case YES multi-threading will help.

If you can't load everything into memory at once, you need to be more specific about your solution--it needs to be tailored to threading.

For example: Suppose you load your array in smaller blocks (Size might not matter much). If you were to load in a 1000x1000x1000 cube, you could sum on that. The results could be stored temporarially in their own three plains, then added to your 3 "final result" planes, then the 1000^3 block could be thrown away never to be read again.

If you do something like this, you won't run out of memory, you won't stress the swapfile and you won't have to worry about any thread synchronization except in a few very small, specific areas (if at all).

The only problem then is to ensure your data is in such a format that you can access a single 1000^3 cube directly--without seeking the hard disk head all over the place.

Edit: The comment was correct and I'm wrong--he totally makes sense.

Since yesterday I realized that the entire problem could be solved as it was read in--each piece of data read in could immediately be summed into the results and discarded. When I think about it that way, you're right, not going to be much help unless the threading can read two streams at the same time without colliding.

0 讨论(0)
发布评论:

提交评论
- 加载中...
独厮守ぢ

2021-02-05 13:57

If you're partitionning your data correctly then yes, you will have a boost in performance. If you check your cpu usage right now, one core will be at 100% and the 3 others should be close to 0%

It all depends on how well you structure your threads and memory usage.

Also, do not expect a x4 improvement. x4 is the maximum achievable, it will always be lower than that depending on a lot of factors.

0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2021-02-05 13:59

Try this code:

int dim = 1000;
int steps = 7 //ranges from 1 to  255

for (int stage = 1; stage < steps; stage++)
for (int k = 0; k < dim; k++)
    for (int i = 0; i < dim; i++)
    {
            sum = 0;
            for (int j = 0; j < dim; j++)
                    if (partMap[(((i * dim) + k) * dim) + j] >= stage)
                            projection[i*dim + j] ++ ;
                            // changed order of i and j
    }


transponse(projection)

I changed the order of loops to make the code cache friendly... You would gain with it an order of magninute performance boost... Be shure.

This is the step you should do before you try to run into multithreading

0 讨论(0)

小蘑菇

2021-02-05 14:01

If, and this is a big IF, it is coded appropriately you will most definitely see a speed up. Now as one of my professors always noted, people often try to take an algorithm, thread it and in the end it is slower. This is often because of inefficient synchronization. So basically if you feel like delving into threading (I honestly wouldn't suggest it if you are new to programming) have a go.

In your particular case the synchronization could be quite straightforward. This is to say, you could assign each thread to a quadrant of the large 3-d matrix, where each thread is guaranteed to have sole access to a specific area of the input and output matrices, thus there is no real need to 'protect' the data from multiple access/writes.

In summary, in this specific simple case threading may be quite easy, but in general synchronization when done poorly can cause the program to take longer. It really all depends.

0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2021-02-05 14:05

Though this would probably be very challenging to you if you're new to programming, a very powerful way to speed things up would be to use the power of GPU. Not only is the VRAM much faster than usual RAM, the GPU can also run your code in parallel on some 128 or more cores. Of course, for this amount of data you will need to have a pretty large VRAM.

If you decide to check this possibility out, you should look up nVidia CUDA. I haven't checked it out myself, but it's meant for problems like this.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轮回少年

2021-02-05 14:07

My gut says you'll see modest improvements. However, predicting the results of optimizations is a notoriously error prone affair.

Try it and benchmark the results.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 3 4 下一页