Multithreaded image processing in C++

后端未结

关注

 16  724

I am working on a program which manipulates images of different sizes. Many of these manipulations read pixel data from an input and write to a separate output (e.g. blur).

相关标签:

16条回答

抹茶落季

2020-12-29 12:53

I don't think you want to have one thread per row. There can be a lot of rows, and you will spend lot of memory/CPU resources just launching/destroying the threads and for the CPU to switch from one to the other. Moreover, if you have P processors with C core, you probably won't have a lot of gain with more than C*P threads.

I would advise you to use a defined number of client threads, for example N threads, and use the main thread of your application to distribute the rows to each thread, or they can simply get instruction from a "job queue". When a thread has finished with a row, it can check in this queue for another row to do.

As for libraries, you can use boost::thread, which is quite portable and not too heavyweight.

0 讨论(0)
发布评论:

提交评论
- 加载中...
忘掉有多难

2020-12-29 12:55
Your compiler doesn't support OpenMP. Another option is to use a library approach, both Intel's Threading Building Blocks and Microsoft Concurrency Runtime are available (VS 2010).

There is also a set of interfaces called the Parallel Pattern Library which are supported by both libraries and in these have a templated parallel_for library call. so instead of:
```
#pragma omp parallel for 
for (i=0; i < numPixels; i++) 
{ ...} 
```
you would write:
```
parallel_for(0,numPixels,1,ToGrayScale());
```
where ToGrayScale is a functor or pointer to function. (Note if your compiler supports lambda expressions which it likely doesn't you can inline the functor as a lambda expression).
```
parallel_for(0,numPixels,1,[&](int i)
{  
   pGrayScaleBitmap[i] = (unsigned BYTE)  
       (pRGBBitmap[i].red * 0.299 +  
        pRGBBitmap[i].green * 0.587 +  
        pRGBBitmap[i].blue * 0.114);  
});
```
-Rick
0 讨论(0)
发布评论:

提交评论
- 加载中...
天命终不由人

2020-12-29 12:56

It is very possible, that bottleneck is not CPU but memory bandwidth, so multi-threading WON'T help a lot. Try to minimize memory access and work on limited memory blocks, so that more data can be cached. I had a similar problem a while ago and I decided to optimize my code to use SSE instructions. Speed increase was almost 4x per single thread!

0 讨论(0)
发布评论:

提交评论
- 加载中...
北海茫月

2020-12-29 12:57

As a bit of a left-field idea...

What systems are you running this on? Have you thought of using the GPU in your PCs?

Nvidia have the CUDA APIs for this sort of thing

0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2 3