What is tail call optimization?

前端未结

关注

 10  1265

Very simply, what is tail-call optimization?

More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of wh

相关标签:

10条回答

既然无缘

2020-11-21 07:22
GCC C minimal runnable example with x86 disassembly analysis

Let's see how GCC can automatically do tail call optimizations for us by looking at the generated assembly.

This will serve as an extremely concrete example of what was mentioned in other answers such as https://stackoverflow.com/a/9814654/895245 that the optimization can convert recursive function calls to a loop.

This in turn saves memory and improves performance, since memory accesses are often the main thing that makes programs slow nowadays.

As an input, we give GCC a non-optimized naive stack based factorial:

tail_call.c
```
#include <stdio.h>
#include <stdlib.h>

unsigned factorial(unsigned n) {
    if (n == 1) {
        return 1;
    }
    return n * factorial(n - 1);
}

int main(int argc, char **argv) {
    int input;
    if (argc > 1) {
        input = strtoul(argv[1], NULL, 0);
    } else {
        input = 5;
    }
    printf("%u\n", factorial(input));
    return EXIT_SUCCESS;
}
```
GitHub upstream.

Compile and disassemble:
```
gcc -O1 -foptimize-sibling-calls -ggdb3 -std=c99 -Wall -Wextra -Wpedantic \
  -o tail_call.out tail_call.c
objdump -d tail_call.out
```
where -foptimize-sibling-calls is the name of generalization of tail calls according to man gcc:
```
   -foptimize-sibling-calls
       Optimize sibling and tail recursive calls.

       Enabled at levels -O2, -O3, -Os.
```
as mentioned at: How do I check if gcc is performing tail-recursion optimization?

I choose -O1 because:
- the optimization is not done with -O0. I suspect that this is because there are required intermediate transformations missing.
- -O3 produces ungodly efficient code that would not be very educative, although it is also tail call optimized.
Disassembly with -fno-optimize-sibling-calls:
```
0000000000001145 <factorial>:
    1145:       89 f8                   mov    %edi,%eax
    1147:       83 ff 01                cmp    $0x1,%edi
    114a:       74 10                   je     115c <factorial+0x17>
    114c:       53                      push   %rbx
    114d:       89 fb                   mov    %edi,%ebx
    114f:       8d 7f ff                lea    -0x1(%rdi),%edi
    1152:       e8 ee ff ff ff          callq  1145 <factorial>
    1157:       0f af c3                imul   %ebx,%eax
    115a:       5b                      pop    %rbx
    115b:       c3                      retq
    115c:       c3                      retq
```
With -foptimize-sibling-calls:
```
0000000000001145 <factorial>:
    1145:       b8 01 00 00 00          mov    $0x1,%eax
    114a:       83 ff 01                cmp    $0x1,%edi
    114d:       74 0e                   je     115d <factorial+0x18>
    114f:       8d 57 ff                lea    -0x1(%rdi),%edx
    1152:       0f af c7                imul   %edi,%eax
    1155:       89 d7                   mov    %edx,%edi
    1157:       83 fa 01                cmp    $0x1,%edx
    115a:       75 f3                   jne    114f <factorial+0xa>
    115c:       c3                      retq
    115d:       89 f8                   mov    %edi,%eax
    115f:       c3                      retq
```
The key difference between the two is that:
- the -fno-optimize-sibling-calls uses callq, which is the typical non-optimized function call.
  
  This instruction pushes the return address to the stack, therefore increasing it.
  
  Furthermore, this version also does push %rbx, which pushes %rbx to the stack.
  
  GCC does this because it stores edi, which is the first function argument (n) into ebx, then calls factorial.
  
  GCC needs to do this because it is preparing for another call to factorial, which will use the new edi == n-1.
  
  It chooses ebx because this register is callee-saved: What registers are preserved through a linux x86-64 function call so the subcall to factorial won't change it and lose n.
- the -foptimize-sibling-calls does not use any instructions that push to the stack: it only does goto jumps within factorial with the instructions je and jne.
  
  Therefore, this version is equivalent to a while loop, without any function calls. Stack usage is constant.
Tested in Ubuntu 18.10, GCC 8.2.
0 讨论(0)
发布评论:

提交评论
- 加载中...
一向

2020-11-21 07:23
Probably the best high level description I have found for tail calls, recursive tail calls and tail call optimization is the blog post

"What the heck is: A tail call"

by Dan Sugalski. On tail call optimization he writes:
Consider, for a moment, this simple function:
```
sub foo (int a) {
  a += 15;
  return bar(a);
}
```
So, what can you, or rather your language compiler, do? Well, what it can do is turn code of the form return somefunc(); into the low-level sequence pop stack frame; goto somefunc();. In our example, that means before we call bar, foo cleans itself up and then, rather than calling bar as a subroutine, we do a low-level goto operation to the start of bar. Foo's already cleaned itself out of the stack, so when bar starts it looks like whoever called foo has really called bar, and when bar returns its value, it returns it directly to whoever called foo, rather than returning it to foo which would then return it to its caller.
And on tail recursion:

Tail recursion happens if a function, as its last operation, returns the result of calling itself. Tail recursion is easier to deal with because rather than having to jump to the beginning of some random function somewhere, you just do a goto back to the beginning of yourself, which is a darned simple thing to do.

So that this:
```
sub foo (int a, int b) {
  if (b == 1) {
    return a;
  } else {
    return foo(a*a + a, b - 1);
  }
```
gets quietly turned into:
```
sub foo (int a, int b) {
  label:
    if (b == 1) {
      return a;
    } else {
      a = a*a + a;
      b = b - 1;
      goto label;
   }
```
What I like about this description is how succinct and easy it is to grasp for those coming from an imperative language background (C, C++, Java)
0 讨论(0)
发布评论:

提交评论
- 加载中...
再見小時候

2020-11-21 07:26
TCO (Tail Call Optimization) is the process by which a smart compiler can make a call to a function and take no additional stack space. The only situation in which this happens is if the last instruction executed in a function f is a call to a function g (Note: g can be f). The key here is that f no longer needs stack space - it simply calls g and then returns whatever g would return. In this case the optimization can be made that g just runs and returns whatever value it would have to the thing that called f.

This optimization can make recursive calls take constant stack space, rather than explode.

Example: this factorial function is not TCOptimizable:
```
def fact(n):
    if n == 0:
        return 1
    return n * fact(n-1)
```
This function does things besides call another function in its return statement.

This below function is TCOptimizable:
```
def fact_h(n, acc):
    if n == 0:
        return acc
    return fact_h(n-1, acc*n)

def fact(n):
    return fact_h(n, 1)
```
This is because the last thing to happen in any of these functions is to call another function.
0 讨论(0)
发布评论:

提交评论
- 加载中...
春和景丽

2020-11-21 07:28
Let's walk through a simple example: the factorial function implemented in C.

We start with the obvious recursive definition
```
unsigned fac(unsigned n)
{
    if (n < 2) return 1;
    return n * fac(n - 1);
}
```
A function ends with a tail call if the last operation before the function returns is another function call. If this call invokes the same function, it is tail-recursive.

Even though fac() looks tail-recursive at first glance, it is not as what actually happens is
```
unsigned fac(unsigned n)
{
    if (n < 2) return 1;
    unsigned acc = fac(n - 1);
    return n * acc;
}
```
ie the last operation is the multiplication and not the function call.

However, it's possible to rewrite fac() to be tail-recursive by passing the accumulated value down the call chain as an additional argument and passing only the final result up again as the return value:
```
unsigned fac(unsigned n)
{
    return fac_tailrec(1, n);
}

unsigned fac_tailrec(unsigned acc, unsigned n)
{
    if (n < 2) return acc;
    return fac_tailrec(n * acc, n - 1);
}
```
Now, why is this useful? Because we immediately return after the tail call, we can discard the previous stackframe before invoking the function in tail position, or, in case of recursive functions, reuse the stackframe as-is.

The tail-call optimization transforms our recursive code into
```
unsigned fac_tailrec(unsigned acc, unsigned n)
{
TOP:
    if (n < 2) return acc;
    acc = n * acc;
    n = n - 1;
    goto TOP;
}
```
This can be inlined into fac() and we arrive at
```
unsigned fac(unsigned n)
{
    unsigned acc = 1;

TOP:
    if (n < 2) return acc;
    acc = n * acc;
    n = n - 1;
    goto TOP;
}
```
which is equivalent to
```
unsigned fac(unsigned n)
{
    unsigned acc = 1;

    for (; n > 1; --n)
        acc *= n;

    return acc;
}
```
As we can see here, a sufficiently advanced optimizer can replace tail-recursion with iteration, which is far more efficient as you avoid function call overhead and only use a constant amount of stack space.
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2