I want to use shared memory between kernel call of one kernel. Can I use shared memory between kernel call?
Previously you could do it in a non-standard way where you would have a unique id for each shared memory block and the next kernel would check the id and therefore carry out required processing on this shared memory block. This was hard to implement as you needed to ensure full occupancy for each kernel and deal with various corner cases. In addition, without formal support you coulf not rely on compatibility across compute device and cuda versions.