Consider the following problem. You have a bit-string that represents the current scheduled slave in one-hot encoding. For example, \"00000100\" (with the leftmost bit being #7
Subracting 1 is the essential idea here. It's used to cascade borrows through the bits to find the next task.
bits_before_current = ~(current-1) & ~current
bits_after_current = current-1
todo = (mask & bits_before_current)
if todo==0: todo = (mask & bits_after_current) // second part is if we have to wrap around
next = last_bit_of_todo = todo & -todo
This will use a loop internally though...