I have parameterized a filtering problem, so that I essentially only have to make a product and sum between two arrays. I want to do this on the GPU, using pytorch to accele