Will multi threading provide any performance boost?

前端 未结 19 849
说谎
说谎 2021-02-05 13:49

I am new to programming in general so please keep that in mind when you answer my question.

I have a program that takes a large 3D array (1 billion elements) and sums up

19条回答
  •  失恋的感觉
    2021-02-05 14:18

    Multithreading across multiple cores could reduce the time required to sum across the axes, but special care is required. You might actually get larger performance boosts from some changes you could make to your single thread code:

    1. You only need as many threads to match the number of cores available to you. This is a CPU intensive operation, and threads are unlikely to be waiting for I/O.

    2. The above assumption might not hold if the entire array does not fit in RAM. If portions of the array are paged in and out, some threads will be waiting for paging operations to complete. In that case, the program might benefit from having more threads than cores. Too many, however, and performance will drop due to the cost of context switching. You might have to experiment with the thread count. The general rule is to minimize the number of context switches between ready threads.

    3. If the entire array does not fit in RAM, you want to minimize paging! The order in which each thread accesses memory matters, as does the memory access pattern of all the running threads. To the extent possible, you would want to finish up with one part of the array before moving to the next, never to return to a covered area.

    4. Each core would benefit from having to access a completely separate region of memory. You want to avoid memory access delays caused by locks and bus contention. At least for one dimension of the cube, that should be straightforward: set each thread with its own portion of the cube.

    5. Each core would also benefit from accessing more data from its cache(s), as opposed to fetching from RAM. That would mean ordering the loops such that inner loops access nearby words, rather than skipping across rows.

    6. Finally, depending on the data types in the array, the SIMD instructions of Intel/AMD processors (SSE, at their various generations) can help accelerate single core performance by summing multiple cells at once. VC++ has some built in support.

    7. If you have to prioritize your work, you might want to first minimize disk paging, then concentrate on optimizing memory access to make use of the CPU caches, and only then deal with multithreading.

提交回复
热议问题