Is there some cool algorithm with bit wise operations?
If the divisor is known in advance (e.g. for code produced by a C compiler, this is a constant known at compile time) then integer division (from which the modulus can be easily obtained) can sometimes be implemented with a multiplication and a shift. See this article for details (warning: this is not light reading).
In many processors, integer multiplication is vastly faster than integer division; some processors do not even have an integer division opcode (multiplication on n-bit values can be optimized into a circuit of depth O(log n), whereas there is no known method to optimize a division circuit below a depth of O(n)).