So here is a very naive matrix multiplication implementation, where I trid to use a lambda expression to define the kernel, so that I can save the work of passing arguments: