Are there any good papers discussing how to take a dynamic program and parallelize it?
There have been some works related to implementing dynamic programming algorithms on GPUs. For e.g.:
Dynamic Programming with CUDA GPU optimized dynamic programming A GPU Implementation of Dynamic Programming for the Optimal Polygon Triangulation