What is tail call optimization?

前端未结

关注

 10  1274

一整个雨季 2020-11-21 06:36

Very simply, what is tail-call optimization?

More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of wh

10条回答

既然无缘 (楼主)

2020-11-21 07:22
GCC C minimal runnable example with x86 disassembly analysis

Let's see how GCC can automatically do tail call optimizations for us by looking at the generated assembly.

This will serve as an extremely concrete example of what was mentioned in other answers such as https://stackoverflow.com/a/9814654/895245 that the optimization can convert recursive function calls to a loop.

This in turn saves memory and improves performance, since memory accesses are often the main thing that makes programs slow nowadays.

As an input, we give GCC a non-optimized naive stack based factorial:

tail_call.c
```
#include 
#include 

unsigned factorial(unsigned n) {
    if (n == 1) {
        return 1;
    }
    return n * factorial(n - 1);
}

int main(int argc, char **argv) {
    int input;
    if (argc > 1) {
        input = strtoul(argv[1], NULL, 0);
    } else {
        input = 5;
    }
    printf("%u\n", factorial(input));
    return EXIT_SUCCESS;
}
```
GitHub upstream.

Compile and disassemble:
```
gcc -O1 -foptimize-sibling-calls -ggdb3 -std=c99 -Wall -Wextra -Wpedantic \
  -o tail_call.out tail_call.c
objdump -d tail_call.out
```
where -foptimize-sibling-calls is the name of generalization of tail calls according to man gcc:
```
   -foptimize-sibling-calls
       Optimize sibling and tail recursive calls.

       Enabled at levels -O2, -O3, -Os.
```
as mentioned at: How do I check if gcc is performing tail-recursion optimization?

I choose -O1 because:
- the optimization is not done with -O0. I suspect that this is because there are required intermediate transformations missing.
- -O3 produces ungodly efficient code that would not be very educative, although it is also tail call optimized.
Disassembly with -fno-optimize-sibling-calls:
```
0000000000001145 :
    1145:       89 f8                   mov    %edi,%eax
    1147:       83 ff 01                cmp    $0x1,%edi
    114a:       74 10                   je     115c 
    114c:       53                      push   %rbx
    114d:       89 fb                   mov    %edi,%ebx
    114f:       8d 7f ff                lea    -0x1(%rdi),%edi
    1152:       e8 ee ff ff ff          callq  1145 
    1157:       0f af c3                imul   %ebx,%eax
    115a:       5b                      pop    %rbx
    115b:       c3                      retq
    115c:       c3                      retq
```
With -foptimize-sibling-calls:
```
0000000000001145 :
    1145:       b8 01 00 00 00          mov    $0x1,%eax
    114a:       83 ff 01                cmp    $0x1,%edi
    114d:       74 0e                   je     115d 
    114f:       8d 57 ff                lea    -0x1(%rdi),%edx
    1152:       0f af c7                imul   %edi,%eax
    1155:       89 d7                   mov    %edx,%edi
    1157:       83 fa 01                cmp    $0x1,%edx
    115a:       75 f3                   jne    114f 
    115c:       c3                      retq
    115d:       89 f8                   mov    %edi,%eax
    115f:       c3                      retq
```
The key difference between the two is that:
- the -fno-optimize-sibling-calls uses callq, which is the typical non-optimized function call.
  
  This instruction pushes the return address to the stack, therefore increasing it.
  
  Furthermore, this version also does push %rbx, which pushes %rbx to the stack.
  
  GCC does this because it stores edi, which is the first function argument (n) into ebx, then calls factorial.
  
  GCC needs to do this because it is preparing for another call to factorial, which will use the new edi == n-1.
  
  It chooses ebx because this register is callee-saved: What registers are preserved through a linux x86-64 function call so the subcall to factorial won't change it and lose n.
- the -foptimize-sibling-calls does not use any instructions that push to the stack: it only does goto jumps within factorial with the instructions je and jne.
  
  Therefore, this version is equivalent to a while loop, without any function calls. Stack usage is constant.
Tested in Ubuntu 18.10, GCC 8.2.
0 讨论(0)

查看其它10个回答
发布评论:

提交评论
- 加载中...