I have some CUDA 8.0 code that looks basically like this:
cudaMemcpy(devInputData, ..., cudaMemcpyHostToDevice); kernelThings<<