I have done it in C# by leveraging NVIDIA's CUDA libraries and .NET's P/invoke. This requires some careful memory management and a good detailed understanding of the CUDA libraries. This technique can be used in conjunction with any custom GPU/CUDA kernels you would like to create in C, so it's a very powerful flexible approach.
If you would like to save yourself a lot of effort you could buy NMath Premium from CenterSpace software (who I work for) and you can be running large problems on your NVIDIA GPU in minutes from C#. NMath Premium a large C#/.NET math library that can run much of LAPACK and FFT's on the GPU, but falls back to the CPU if the hardware isn't available or the problem size doesn't justify a round trip to the GPU.