When I was learning Java coming from a background of some 20 years of procedural programming with basic, Pascal, COBOL and C, I thought at the time that the hardest thing about
There are a number of techniques which are coming into the public consciousness just now (as in: the last few years). A big one would be actors. This is something that Erlang first brought to the grid iron but which has been carried forward by newer languages like Scala (actors on the JVM). While it is true that actors don't solve every problem, they do make it much easier to reason about your code and identify trouble spots. They also make it much simpler to design parallel algorithms because of the way they force you to use continuation passing over shared mutable state.
Fork/Join is something you should look at, especially if you're on the JVM. Doug Lea wrote the seminal paper on the topic, but many researchers have discussed it over the years. As I understand it, Doug Lea's reference framework is scheduled for inclusion into Java 7.
On a slightly less-invasive level, often the only steps necessary to simplify a multi-threaded application are just to reduce the complexity of the locking. Fine-grained locking (in the Java 5 style) is great for throughput, but very very difficult to get right. One alternative approach to locking which is gaining some traction through Clojure would be software-transactional memory (STM). This is essentially the opposite of conventional locking in that it is optimistic rather than pessimistic. You start out by assuming that you won't have any collisions, and then allow the framework to fix the problems if and when they occur. Databases often work this way. It's great for throughput on systems with low collision rates, but the big win is in the logical componentization of your algorithms. Rather than arbitrarily associating a lock (or a series of locks) with some data, you just wrap the dangerous code in a transaction and let the framework figure out the rest. You can even get a fair bit of compile-time checking out of decent STM implementations like GHC's STM monad or my experimental Scala STM.
There are a lot of new options for building concurrent applications, which one you pick depends greatly on your expertise, your language and what sort of problem you're trying to model. As a general rule, I think actors coupled with persistent, immutable data structures are a solid bet, but as I said, STM is a little less invasive and can sometimes yield more immediate improvements.