I am implementing a simple center of mass/centroid algorithm for 2D rasters. This is rather trivial on the CPU but has proven difficult to port to the GPU. My CPU version is som