Another easy way to get into GPU programming, without getting into CUDA or OpenCL, is to do it via OpenACC.
OpenACC works like OpenMP, with compiler directives (like #pragma acc kernels
) to send work to the GPU. For example, if you have a big loop (only larger ones really benefit):
int i;
float a = 2.0;
float b[10000];
#pragma acc kernels
for (i = 0; i < 10000; ++i) b[i] = 1.0f;
#pragma acc kernels
for (i = 0; i < 10000; ++i) {
b[i] = b[i] * a;
}
Edit: unfortunately, only the PGI compiler really supports OpenACC right now, for NVIDIA GPU cards.