Just for fun, I tried to compare the stack performance of a couple of programming languages calculating the Fibonacci series using the naive recursive algorithm. The code is
One C trick which you can try is to disable the stack checking (i e built-in code which makes sure that the stack is large enough to permit the additional allocation of the current function's local variables). This could be dicey for a recursive function and indeed could be the reason behind the slow C times: the executing program might well have run out of stack space which forces the stack-checking to reallocate the entire stack several times during the actual run.
Try to approximate the stack size you need and force the linker to allocate that much stack space. Then disable stack-checking and re-make the program.