I am new to programming in general so please keep that in mind when you answer my question.
I have a program that takes a large 3D array (1 billion elements) and sums up
The questions you need to answer for your particular application are well-known.
First, is the work parallelisable? Amdahl's Law will give you an upper bound on how much you can speed things up with multithreading.
Second, would a multithreaded solution introduce a lot of overhead? You say the program is "RAM intensive as the program is constantly fetching information from the RAM, both reading and writing." So you need to determine if the reading/writing is going to cause significant coordination overhead. This isn't easy. Although each CPU can access the computer's entire RAM (both read and write) at any time, doing so can slow down memory accesses -- even without locks -- because the various CPUs keep their own caches and need to coordinate what's in their caches with each other (CPU 1 has a value in cache, CPU 2 updates that value in RAM, CPU 2 has to tell CPU 1 to invalidate its cache). And if you do need locks (which is almost a guarantee as you're both "reading and writing" memory) then you'll need to avoid contention as much as possible.
Third, are you memory bound? "RAM intensive." is not the same thing as "memory bound." If you are currently CPU bound then multithreading will speed things up. If you are currently memory bound then multithreading may even slow things down (if one thread is too fast for memory, then what will happen with multiple threads?).
Fourth, are you slow for some other reason? If you're new
ing or malloc
ing a lot of memory in your algorithm you may be seeing overheads from that alone. And on many platforms both new and malloc don't handle multithreading well, so if you're slow right now because malloc
is bad, a multithreaded program will be even slower because malloc
will be worse.
Overall, however, without seeing your code, I would expect it to be CPU bound and I would expect multithreading to speed things up -- almost as much as Amdahl's law would suggest, in fact. You may want to look at OpenMP or Intel's Threading Building Blocks library, or some sort of thread queue to do it, though.