This is part of my CUDA code. But last part of this code says some error message.
unsigned int *mat_count; off_t *mat_position; unsigned int *matches_count; off_
An unspecified launch failure is almost always a segfault. You've got an indexing mistake somewhere in your kernel, probably while accessing global memory.
I'd look through your code, but it's mildly incomprehensible...