I\'ve read the article on Wikipedia on the Duff\'s device, and I don\'t get it. I am really interested, but I\'ve read the explanation there a couple of times and I still do
Though I'm not 100% sure what you're asking for, here goes...
The issue that Duff's device addresses is one of loop unwinding (as you'll no doubt have seen on the Wiki link you posted). What this basically equates to is an optimisation of run-time efficiency, over memory footprint. Duff's device deals with serial copying, rather than just any old problem, but is a classic example of how optimisations can be made by reducing the number of times that a comparison needs to be done in a loop.
As an alternative example, which may make it easier to understand, imagine you have an array of items you wish to loop over, and add 1 to them each time... ordinarily, you might use a for loop, and loop around 100 times. This seems fairly logical and, it is... however, an optimisation can be made by unwinding the loop (obviously not too far... or you may as well just not use the loop).
So a regular for loop:
for(int i = 0; i < 100; i++)
{
myArray[i] += 1;
}
becomes
for(int i = 0; i < 100; i+10)
{
myArray[i] += 1;
myArray[i+1] += 1;
myArray[i+2] += 1;
myArray[i+3] += 1;
myArray[i+4] += 1;
myArray[i+5] += 1;
myArray[i+6] += 1;
myArray[i+7] += 1;
myArray[i+8] += 1;
myArray[i+9] += 1;
}
What Duff's device does is implement this idea, in C, but (as you saw on the Wiki) with serial copies. What you're seeing above, with the unwound example, is 10 comparisons compared to 100 in the original - this amounts to a minor, but possibly significant, optimisation.
Just experimenting, found another variant getting along without interleaving switch
statement and do
-while
-loop:
int n = (count + 1) / 8;
switch (count % 8)
{
LOOP:
case 0:
if(n-- == 0)
break;
putchar('.');
case 7:
putchar('.');
case 6:
putchar('.');
case 5:
putchar('.');
case 4:
putchar('.');
case 3:
putchar('.');
case 2:
putchar('.');
case 1:
putchar('.');
default:
goto LOOP;
}
Technically, the goto
still implements a loop, but this variant might be slightly more readable.
When I read it for the first time, I autoformatted it to this
void dsend(char* to, char* from, count) {
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do {
*to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
and I had no idea what was happening.
Maybe not when this question was asked, but now Wikipedia has a very good explanation
The device is valid, legal C by virtue of two attributes in C:
- Relaxed specification of the switch statement in the language's definition. At the time of the device's invention this was the first edition of The C Programming Language which requires only that the controlled statement of the switch be a syntactically valid (compound) statement within which case labels can appear prefixing any sub-statement. In conjunction with the fact that, in the absence of a break statement, the flow of control will fall-through from a statement controlled by one case label to that controlled by the next, this means that the code specifies a succession of count copies from sequential source addresses to the memory-mapped output port.
- The ability to legally jump into the middle of a loop in C.
The explanation in Dr. Dobb's Journal is the best that I found on the topic.
This being my AHA moment:
for (i = 0; i < len; ++i) {
HAL_IO_PORT = *pSource++;
}
becomes:
int n = len / 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
}
n = len % 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
}
becomes:
int n = (len + 8 - 1) / 8;
switch (len % 8) {
case 0: do { HAL_IO_PORT = *pSource++;
case 7: HAL_IO_PORT = *pSource++;
case 6: HAL_IO_PORT = *pSource++;
case 5: HAL_IO_PORT = *pSource++;
case 4: HAL_IO_PORT = *pSource++;
case 3: HAL_IO_PORT = *pSource++;
case 2: HAL_IO_PORT = *pSource++;
case 1: HAL_IO_PORT = *pSource++;
} while (--n > 0);
}
There are some good explanations elsewhere, but let me give it a try. (This is a lot easier on a whiteboard!) Here's the Wikipedia example with some notations.
Let's say you're copying 20 bytes. The flow control of the program for the first pass is:
int count; // Set to 20
{
int n = (count + 7) / 8; // n is now 3. (The "while" is going
// to be run three times.)
switch (count % 8) { // The remainder is 4 (20 modulo 8) so
// jump to the case 4
case 0: // [skipped]
do { // [skipped]
*to = *from++; // [skipped]
case 7: *to = *from++; // [skipped]
case 6: *to = *from++; // [skipped]
case 5: *to = *from++; // [skipped]
case 4: *to = *from++; // Start here. Copy 1 byte (total 1)
case 3: *to = *from++; // Copy 1 byte (total 2)
case 2: *to = *from++; // Copy 1 byte (total 3)
case 1: *to = *from++; // Copy 1 byte (total 4)
} while (--n > 0); // N = 3 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
Now, start the second pass, we run just the indicated code:
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 5)
case 7: *to = *from++; // Copy 1 byte (total 6)
case 6: *to = *from++; // Copy 1 byte (total 7)
case 5: *to = *from++; // Copy 1 byte (total 8)
case 4: *to = *from++; // Copy 1 byte (total 9)
case 3: *to = *from++; // Copy 1 byte (total 10)
case 2: *to = *from++; // Copy 1 byte (total 11)
case 1: *to = *from++; // Copy 1 byte (total 12)
} while (--n > 0); // N = 2 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
Now, start the third pass:
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 13)
case 7: *to = *from++; // Copy 1 byte (total 14)
case 6: *to = *from++; // Copy 1 byte (total 15)
case 5: *to = *from++; // Copy 1 byte (total 16)
case 4: *to = *from++; // Copy 1 byte (total 17)
case 3: *to = *from++; // Copy 1 byte (total 18)
case 2: *to = *from++; // Copy 1 byte (total 19)
case 1: *to = *from++; // Copy 1 byte (total 20)
} while (--n > 0); // N = 1 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it's not, so bail)
} // continue here...
20 bytes are now copied.
Note: The original Duff's Device (shown above) copied to an I/O device at the to
address. Thus, it wasn't necessary to increment the pointer *to
. When copying between two memory buffers you'd need to use *to++
.
1: Duffs device is a particular implementation of loop unrolling. Loop unrolling is an optimisation technique applicable if you have an operation to perform N times in a loop - you can trade program size for speed by executing the loop N/n times and then in the loop inlining (unrolling) the loop code n times e.g. replacing:
for (int i=0; i<N; i++) {
// [The loop code...]
}
with
for (int i=0; i<N/n; i++) {
// [The loop code...]
// [The loop code...]
// [The loop code...]
...
// [The loop code...] // n times!
}
Which works great if N % n == 0 - no need for Duff! If that is not true then you have to handle the remainder - which is a pain.
2: How does Duffs device differ from this standard loop unrolling?
Duffs device is just a clever way of dealing with the remainder loop cycles when N % n != 0. The whole do / while executes N / n number of times as per standard loop unrolling (because the case 0 applies). On the last first run through the loop the case kicks in and we run the loop code the 'remainder' number of times - the remaining runs through the loop run 'normally'.