pycuda... nothing like 25000 active threads :) [warp scheduled with scoreboarding]. cuda 2 has stream support, so I'm not sure what streamit would bring. CUDA Matlab extensions look neat, as do PLUTO and the coming PetaBricks from MIT.
as far as others, python's threading is lacking; MPI, etc. are complicated, and I don't have a cluster, but I suppose they achieve what they are built for; I stopped c# programming before I got to thread apartments (probably a good thing).