Multithreaded image processing in C++

后端 未结 16 724
既然无缘
既然无缘 2020-12-29 12:16

I am working on a program which manipulates images of different sizes. Many of these manipulations read pixel data from an input and write to a separate output (e.g. blur).

相关标签:
16条回答
  • 2020-12-29 12:53

    I don't think you want to have one thread per row. There can be a lot of rows, and you will spend lot of memory/CPU resources just launching/destroying the threads and for the CPU to switch from one to the other. Moreover, if you have P processors with C core, you probably won't have a lot of gain with more than C*P threads.

    I would advise you to use a defined number of client threads, for example N threads, and use the main thread of your application to distribute the rows to each thread, or they can simply get instruction from a "job queue". When a thread has finished with a row, it can check in this queue for another row to do.

    As for libraries, you can use boost::thread, which is quite portable and not too heavyweight.

    0 讨论(0)
  • 2020-12-29 12:55

    Your compiler doesn't support OpenMP. Another option is to use a library approach, both Intel's Threading Building Blocks and Microsoft Concurrency Runtime are available (VS 2010).

    There is also a set of interfaces called the Parallel Pattern Library which are supported by both libraries and in these have a templated parallel_for library call. so instead of:

    #pragma omp parallel for 
    for (i=0; i < numPixels; i++) 
    { ...} 
    

    you would write:

    parallel_for(0,numPixels,1,ToGrayScale());
    

    where ToGrayScale is a functor or pointer to function. (Note if your compiler supports lambda expressions which it likely doesn't you can inline the functor as a lambda expression).

    parallel_for(0,numPixels,1,[&](int i)
    {  
       pGrayScaleBitmap[i] = (unsigned BYTE)  
           (pRGBBitmap[i].red * 0.299 +  
            pRGBBitmap[i].green * 0.587 +  
            pRGBBitmap[i].blue * 0.114);  
    });
    

    -Rick

    0 讨论(0)
  • 2020-12-29 12:56

    It is very possible, that bottleneck is not CPU but memory bandwidth, so multi-threading WON'T help a lot. Try to minimize memory access and work on limited memory blocks, so that more data can be cached. I had a similar problem a while ago and I decided to optimize my code to use SSE instructions. Speed increase was almost 4x per single thread!

    0 讨论(0)
  • 2020-12-29 12:57

    As a bit of a left-field idea...

    What systems are you running this on? Have you thought of using the GPU in your PCs?

    Nvidia have the CUDA APIs for this sort of thing

    0 讨论(0)
提交回复
热议问题