I am new to programming in general so please keep that in mind when you answer my question.
I have a program that takes a large 3D array (1 billion elements) and sums up
Multithreading will only make your code faster if the computations can be broken down into chunks that can be worked on independently and concurrently.
EDIT
I said the above (it's almost an automatic response) because I see many developers spend a lot of time on multithreading code for no performance increase at all. Of course, then they end up with the same (or even slower performance) and the extra complications of managing the multiple threads.
Yes, it does appear after reading your question again and taking into account your specific case you would benefit from multithreading.
RAM is very fast, so I think it would be very hard to saturate the memory bandwidth unless you have many, many threads.
Before you go multithreaded, you should run a profiler against your code. It's probably a different question as to where a good (possibly) free C++ profiler can be found.
This will help you identify any bits of your code that are taking up significant portions of computation time. A tweak here and there after some profiling can sometimes make massive differences to performance.
The questions you need to answer for your particular application are well-known.
First, is the work parallelisable? Amdahl's Law will give you an upper bound on how much you can speed things up with multithreading.
Second, would a multithreaded solution introduce a lot of overhead? You say the program is "RAM intensive as the program is constantly fetching information from the RAM, both reading and writing." So you need to determine if the reading/writing is going to cause significant coordination overhead. This isn't easy. Although each CPU can access the computer's entire RAM (both read and write) at any time, doing so can slow down memory accesses -- even without locks -- because the various CPUs keep their own caches and need to coordinate what's in their caches with each other (CPU 1 has a value in cache, CPU 2 updates that value in RAM, CPU 2 has to tell CPU 1 to invalidate its cache). And if you do need locks (which is almost a guarantee as you're both "reading and writing" memory) then you'll need to avoid contention as much as possible.
Third, are you memory bound? "RAM intensive." is not the same thing as "memory bound." If you are currently CPU bound then multithreading will speed things up. If you are currently memory bound then multithreading may even slow things down (if one thread is too fast for memory, then what will happen with multiple threads?).
Fourth, are you slow for some other reason? If you're new
ing or malloc
ing a lot of memory in your algorithm you may be seeing overheads from that alone. And on many platforms both new and malloc don't handle multithreading well, so if you're slow right now because malloc
is bad, a multithreaded program will be even slower because malloc
will be worse.
Overall, however, without seeing your code, I would expect it to be CPU bound and I would expect multithreading to speed things up -- almost as much as Amdahl's law would suggest, in fact. You may want to look at OpenMP or Intel's Threading Building Blocks library, or some sort of thread queue to do it, though.
Eliminate False Sharing
This is where is multiple cores are blocking on each other trying to read or update different memory addresses that share the same block cache. Processor cache locking is per block, and only one thread can write to that block at once.
Herb Sutter has a very good article on False Sharing, how to discover it and how to avoid it in your parallel algorithms.
Obviously he has loads of other excellent articals on concurrent programming too, see his blog.
It's impossible to tell, in general, because you did not specify how fast your CPU and RAM are. Good chances are that it will improve things, because I can't imagine how even 4 threads summing in parallel would saturate RAM enough that it would become a bottleneck (and not the CPU).
If you can divide the array in a way that the threads don't write/read to/from the same positions in the array it should increase your speed.