I have allocated memory on device using cudaMalloc and have passed it to a kernel function. Is it possible to access that memory from host before the kernel finishes its executi
It is possible, but there's no guarantee as to the contents of the memory you retrieve in such a way, since you don't know what the progress of the kernel is.
What you're trying to achieve is to overlap data transfer and execution. That is possible through the use of streams. You create multiple CUDA streams, and queue a kernel execution and a device-to-host cudaMemcpy in each stream. For example, put the kernel that fills the location "0" and cudaMemcpy from that location back to host into stream 0, kernel that fills the location "1" and cudaMemcpy from "1" into stream 1, etc. What will happen then is that the GPU will overlap copying from "0" and executing "1". Check CUDA documentation, it's documented somewhere (in the best practices guide, I think).