I understood why Duff\'s device is faster than normal loop code which can be unrolled but is not optimized. But I can\'t understand how the code can be compiled yet.
I g
A simple explanation of why Duff's Device compiles is that the syntax of the switch
statement isn't particularly specific about the form that the switch
statement block might need to take. There are a few restrictions, and a couple of things permitted in the controlled statement that aren't permitted outside a switch
(the case
and default
labels). But other than that, the controlled statement is just any other statement, with the likelihood that there are labels for the switch
to target.
Here's the syntax from C99:
switch ( expression ) statement
Beyond the syntax, the standard imposes a few constraints:
case
label expressions must be integer constant expressionscase
label expressions or default
labelsOther than that, any construct permitted in a statement block should be permitted in the controlled statement (with the addition that case
and default
labels are OK). Remember that case
and default
are just labels that the switch jumps to based on the controlling expression and the case
label expressions. As Potatoswatter says, switch
is just a computed goto
. So just like goto
can jump into the middle of a loop, so can switch
.
Also, I think that the cases where you might see a benefit from Duff's Device are pretty rare today (I think they were rare even in the 1980's). Don't forget that Tom Duff himself said the following in his description:
Even more than when it was initially described, Duff's Device should be considered more of a curiosity than a tool to use.
While it's true that Duff's Device is outdated for its original purpose, it's still useful for special purposes, like a state machine that normally cycles repeatedly through N
states, but sometimes needs to return to the caller and later be resumed at the state where it left off. Putting the switch
statement outside the loop and the case
labels inside the loop (I would take this as the definition of a "Duff's device") then makes a lot of sense.
With that said, do not use Duff's devices to "optimize by hand". Putting what are effectively "goto labels" all over the place will not help the compiler optimize.
If we take the implementation from the Wikipedia article you link...
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do{ *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
}while(--n>0);
}
}
...and replace the "high-level" do
/ while
loop with the assembly-level if
/ goto
that the compiler really reduces it to...
send(to, from, count)
register short *to, *from;
register count;
{
register n=(count+7)/8;
switch(count%8){
case 0: do_label: *to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
if (--n>0) goto do_label;
}
}
...it may help you perceive that - in this usage where the do/while scope didn't introduce any local variables - there's really nothing more to the do-while than a jump back to case 0 that bypasses the switch and hence the need to evaluate count % 8 (% is a pretty expensive operation in the scheme of things).
Hope that helps it click, but may not...? :-)
Why did duff use 8? It could be 16, 65536 or whatever. Because of code size? Is there another reason? For example, cache or pipelining benefits.
Just a case of diminishing returns. Having to do the --n > 0
check and a jump after every 8 data copies isn't a big percentage overhead, but the size of the code (both in source and as compiled code in cache) is still pretty tight. Maybe it'd be 90 or 95% work vs overheads, which was evidently good enough. Further, to illustrate and share the concept with others Tom Duff may have prefered it to be about a typical 80x25 terminal's worth of code rather than a page or 10.
The "how it works" is simple enough.
Both C and C++ are compiled languages, and normally compiled to the platforms machine code. Machine code has no concept of block structures - all block structures must be translated to a form that uses (in effect) some mix of unconditional and conditional gotos.
The C syntax rules allow the switch statement and loop to be combined in a way which is not a true hierarchical block structure, but which tangles control flow. So long as the compiler can cope with this (which any good compiler should) there is no problem in the underlying machine code. The result will be "spaghetti", but generated machine code that has been through an optimiser is always spaghetti - it's not meant to be human readable, so it's not an issue. The issue here is that the source code is spaghetti too, even though the gotos have been "hidden".
Note - although any good compiler should cope with Duffs device, as others have already commented, that doesn't mean it will cope well enough to optimise it properly - only well enough to generate correct executable code. This is one of these old strange idioms that once had a purpose, but which is more likely now to confuse your compiler and sabotage it's ability to generate efficient code.
EDIT
The following is related to Duffs device, and may help illustrate the basic idea...
switch (count & 1)
{
case 0 : goto lbl0;
case 1 : goto lbl1;
}
lbl0:
while (count != 0)
{
handle_one ();
count--;
lbl1:
handle_one ();
count--;
}
Having case clauses inside the loop is conceptually no different to having goto-target labels inside the loop, as above.
Warning - this is purely for illustration of an idea, and should not be copied in real life code.
switch
is just a computed goto
. So, there are several labels inside the loop, and a switch
statement outside the loop. The switch
decides which label to go to, and goto
s there, inside the loop.
Once execution is inside the loop, it goes on looping until the loop relinquishes control.
It's actually very straightforward… and shouldn't be used unless it is the most straightforward alternative.
I would even say to stop listening to anything they say at all, even if it's your teacher.
As for compilers, they break things down to generic control-flow graphs and don't care about switch
vs. if
vs. while
. They are all if ( … ) goto …; else goto …;
to the compiler.