Consider the following problem. You have a bit-string that represents the current scheduled slave in one-hot encoding. For example, \"00000100\" (with the leftmost bit being #7
The following solution works for any number of slaves (K), and is O(n) in your FPGA. For each bit in the field, you will require three logic gates and two inverters. I tested out the concept with a basic logic simulator, and it works.
The chain of logic gates between current and mask essentially creates a priority system that favors bits "lower down" in the chain. This chain is looped at the ends, but the current bits are used to break the chain.
To visualize the operation, imagine that bit 3 is set in the current field, and follow the signal downwards in the diagram. The logical one at bit 3 places a logical zero at the input to the first AND gate, which guarantees that the output of that AND gate will also be zero (this is where the OR-gate chain is broken). The zero at the output of the first AND gate places a one at the input to the second AND gate. This makes bit 2 of next directly dependent on bit 2 of mask.
Now, the chain of OR gates comes into play.
If bit 2 of mask was set, the logical output of the OR gate directly to the left of it will also be a one, which will place a logical one at the input to the AND gate below bit 2 of current (which will be zero, since only one bit in current can be set at a time). The logical one at the output of the top AND gate places a logical zero at the input of the bottom AND gate, thus setting bit 1 of next equal to zero.
If bit 2 of mask was not set, both inputs to the OR gate would be zero, so the output of the AND gate below bit 2 of current would be a zero, placing a one at the input to the bottom AND gate, and therefore making bit 1 of next dependent on bit 1 of mask.
This logic follows the chain of OR gates "up" the bits, looping around from the left side back over to the right, ensuring that only one bit in next can be set to a one. The loop stops once it makes its way back to bit 3 of current, as a result of that bit being set. This prevents the circuit from staying in a perpetual loop.
I have no experience with Verilog or VHDL, so I'll leave the actual code up to you and the rest of stackoverflow.
alt text http://img145.imageshack.us/img145/5125/bitshifterlogicdiagramkn7.jpg
notes:
Subracting 1 is the essential idea here. It's used to cascade borrows through the bits to find the next task.
bits_before_current = ~(current-1) & ~current
bits_after_current = current-1
todo = (mask & bits_before_current)
if todo==0: todo = (mask & bits_after_current) // second part is if we have to wrap around
next = last_bit_of_todo = todo & -todo
This will use a loop internally though...
Assuming twos complement representation, call your two words mask
and current
, in C:
mask_lo = (current << 1) - 1; // the bits to the right and including current
mask_hi = ~mask_lo; // the bits to the left of current
// the left bits, otherwise right:
next = (mask & mask_hi) ? (mask & mask_hi) : (mask & mask_lo);
return (next & -next); // the least significant bit set
Interesting problem! I can't help but wonder if you can't simplify your scheduler operation so this sort of operation would be necessary.
Given that you know VHDL, I won't go into detail, but my suggestion would be the following:
Use a 3 bit encoder to turn the currently scheduled task into a number:
01000000 --> 6
Then use a barrel shifter to rotate the mask by that number + 1 (to skip the current task):
00001010 --> 00010100
Then use a priority encoder to find the first available "next" task:
00010100 --> 00000100 --> 2
Then reverse the barrel shift by addition:
(2+7) % 8 = 1
Which when re-encoded will give the next scheduled task:
00000010
Should be very fast and straightforward, although the barrel shifter is 'expensive' in terms of realestate, but I don't see an easy way to get around that at the moment.
Edit: Doug's solution is significantly more elegant...
-Adam
A loop does not have to be bad.
I would simply do
current[i] = current[i-1] & mask[i] | // normal shift logic
mask[i] & current[i-2] & !mask[i-1] | // here build logic
... // expression for
// remaining
And then put it into a generate loop (ie it will get unrolled into hardware), which will produce parallel hardware for the expressions.
Other here mentioned solutions use multiple "-". I can only discourage them, as this will get you a really expensive operation. Esp. in one hot you can get easily more than > 32 bits, which will not easily be implementable in HW, as the borrow has to go through all bits (the deadicated carry logic on certain fpgas make it approachable for small number of bits).
I've found the following Verilog code for implementing the task in the Altera advanced synthesis cookbook.
// 'base' is a one hot signal indicating the first request
// that should be considered for a grant. Followed by higher
// indexed requests, then wrapping around.
//
module arbiter (
req, grant, base
);
parameter WIDTH = 16;
input [WIDTH-1:0] req;
output [WIDTH-1:0] grant;
input [WIDTH-1:0] base;
wire [2*WIDTH-1:0] double_req = {req,req};
wire [2*WIDTH-1:0] double_grant = double_req & ~(double_req-base);
assign grant = double_grant[WIDTH-1:0] | double_grant[2*WIDTH-1:WIDTH];
endmodule
It uses subtraction (only once, though), so conceptually it's quite similar to Doug's solution.