Since the beginning of CPUs it has been general knowledge that the integer division instruction is expensive. I went to see how bad it is today, on CPUs which have the luxury of
As explained by user pvg through the comments, the hypothesized algorithm is indeed in existence and the best one currently known. The algorithm involves division by the same divisor in the preparatory step, so it is fundamentally irreducible as a whole. It is covered in Chapter 10 of the classic publication Hacker's Delight.