Consider the following problem. You have a bit-string that represents the current scheduled slave in one-hot encoding. For example, \"00000100\" (with the leftmost bit being #7
A loop does not have to be bad.
I would simply do
current[i] = current[i-1] & mask[i] | // normal shift logic
mask[i] & current[i-2] & !mask[i-1] | // here build logic
... // expression for
// remaining
And then put it into a generate loop (ie it will get unrolled into hardware), which will produce parallel hardware for the expressions.
Other here mentioned solutions use multiple "-". I can only discourage them, as this will get you a really expensive operation. Esp. in one hot you can get easily more than > 32 bits, which will not easily be implementable in HW, as the borrow has to go through all bits (the deadicated carry logic on certain fpgas make it approachable for small number of bits).