I wrote a lattice dynamics simulation using Metal/Swift on macOS. It contains only highly parallel multiply-and-adds, but I still can\'t get the Metal/GPU to beat the CPU. (