CUDA has a fairly substantial code base of examples and a SDK, including a BLAS back-end. Try to find some examples similar to what you are doing, perhaps also looking at the GPU Gems series of books, to gauge how well CUDA will fit your applications. I'd say from a logistic point of view, CUDA is easier to work with and much, much cheaper than any professional FPGA development toolkit.
At one point I did look into CUDA for claim reserve simulation modelling. There is quite a good series of lectures linked off the web-site for learning. On Windows, you need to make sure CUDA is running on a card with no displays as the graphics subsystem has a watchdog timer that will nuke any process running for more than 5 seconds. This does not occur on Linux.
Any mahcine with two PCI-e x16 slots should support this. I used a HP XW9300, which you can pick up off ebay quite cheaply. If you do, make sure it has two CPU's (not one dual-core CPU) as the PCI-e slots live on separate Hypertransport buses and you need two CPU's in the machine to have both buses active.