Are there any good papers discussing how to take a dynamic program and parallelize it?
We recently published a paper showing how to parallelize any d.p. on a shared memory multicore computer by means of a shared lock-free hash table:
Stivala, A. and Stuckey, P. J. and Garcia de la Banda, M. and Hermenegildo, M. and Wirth, A. 2010 "Lock-free parallel dynamic programming" J. Parallel Distrib. Comput. 70:839-848 doi:10.1016/j.jpdc.2010.01.004
http://dx.doi.org/10.1016/j.jpdc.2010.01.004
Essentially, you start multiple threads, all running the same code starting at the value of the d.p. you want to compute, computing it top-down (recursively), and memoizing in a shared lock-free hash table, but randomizing the order in which subproblems are computed so that the threads diverge in which subproblems they compute.
In terms of implementation, we just used C and pthreads on UNIX type systems, all you need is to be able to have shared memory, and CompareAndSwap (CAS) for lock-free synchronization between threads.
Because this paper was published in an Elsevier journal, you'll need to access the above through a University library or similar with a subscription to it. You might be able to get a pre-print copy via Prof. Stuckey's webpage though.