i just started learning assembly and making some custom loop for swapping two variables using C++ \'s asm{} body with Digital-Mars compiler in C-Free 5.0
Enabled th
It's a bit hard to guess what your compiler may be doing without seeing the assembly language result it creates. With VC++ 10, I get the following results:
time of for-loop(cycles) 155
time of while-loop(cycles) 158
time of custom-loop-1(cycles) 369
time of custom-loop-2(cycles) 314
I didn't look at the output, but my immediate guess would be that the difference between the for
and while
loops is just noise. Both are obviously quite a bit faster than your hand-written assembly code though.
Edit: looking at the assembly code, I was right -- the code for the for
and the while
is identical. It looks like this:
call _clock
mov ecx, DWORD PTR _a$[ebp]
cdq
mov ebx, edx
mov edx, DWORD PTR _b$[ebp]
mov edi, eax
mov esi, 200000000
$LL2@main:
; Line 28
dec esi
; Line 30
mov eax, ecx
; Line 31
mov ecx, edx
; Line 32
mov edx, eax
jne SHORT $LL2@main
mov DWORD PTR _b$[ebp], edx
mov DWORD PTR _a$[ebp], ecx
; Line 35
call _clock
While arguably less "clever" than your second loop, modern CPUs tend to do best with simple code. It also just has fewer instructions inside the loop (and doesn't reference memory inside the loop at all). Those aren't the sole measures of efficiency by any means, but with this simple of a loop, they're fairly indicative.
Edit 2:
Just for fun, I wrote a new version that adds the triple-XOR swap, as well as one using the CPU's xchg
instruction (just because that's how I'd probably write it by hand if I didn't care much about speed, etc.) Though Intel/AMD generally recommend against the more complex instructions, it doesn't seem to cause a problem -- it seems to be coming out at least as fast as anything else:
time of for-loop(cycles) 156
time of while-loop(cycles) 160
time swap between register and cache 284
time to swap using add/sub: 308
time to swap using xchg: 155
time to swap using triple-xor 233
Source:
// Note: updated source -- it was just too ugly to live. Same results though.
#include
#include
#include
#include
#include
#include
namespace {
int a, b;
const int loops = 200000000;
}
template
struct timer {
timer(std::string const &label) {
clock_t t1 = clock();
swapper()();
clock_t t2 = clock();
std::ostringstream buffer;
buffer << "Time for swap using " << label;
std::cout << std::left << std::setw(30) << buffer.str() << " = " << (t2-t1) << "\n";
}
};
struct for_loop {
void operator()() {
int temp;
for(int i=0;i("for loop");
timer("while loop");
timer("reg<->mem");
timer("add/sub");
timer("xchg");
timer("triple xor");
return 0;
}
Bottom line: at least for this trivial of a task, you're not going to beat a decent compiler by enough to care about (and probably not at all, except possibly in terms of minutely smaller code).