I was trying to make a gemm program using __device __ variables instead of declaring it dynamically using cudaMalloc, but it keeps telling me that GPUassert