For illustrative purposes, let __device__ void distance(char *s1, char* s2) be the kernel function, which is run over across several blocks and threads co
__device__ void distance(char *s1, char* s2)
co