Is the ranged based for loop beneficial to performance?

前端 未结 5 1875
遥遥无期
遥遥无期 2020-12-02 18:34

Reading various questions here on Stack Overflow about C++ iterators and performance**, I started wondering if for(auto& elem : container) gets \"expan

相关标签:
5条回答
  • 2020-12-02 18:44

    Out of curiosity I decided to look at the assembly code for both approaches:

    int foo1(const std::vector<int>& v) {
        int res = 0;
        for (auto x : v)
            res += x;
        return res;
    }
    
    int foo2(const std::vector<int>& v) {
        int res = 0;
        for (std::vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
          res += *it;
        return res;
    }
    

    And the assembly code (with -O3 and gcc 4.6) is exactly the same for both approaches (code for foo2 is omitted, since it is exactly the same):

    080486d4 <foo1(std::vector<int, std::allocator<int> > const&)>:
    80486d4:       8b 44 24 04             mov    0x4(%esp),%eax
    80486d8:       8b 10                   mov    (%eax),%edx
    80486da:       8b 48 04                mov    0x4(%eax),%ecx
    80486dd:       b8 00 00 00 00          mov    $0x0,%eax
    80486e2:       39 ca                   cmp    %ecx,%edx
    80486e4:       74 09                   je     80486ef <foo1(std::vector<int, std::allocator<int> > const&)+0x1b>
    80486e6:       03 02                   add    (%edx),%eax
    80486e8:       83 c2 04                add    $0x4,%edx
    80486eb:       39 d1                   cmp    %edx,%ecx
    80486ed:       75 f7                   jne    80486e6 <foo1(std::vector<int, std::allocator<int> > const&)+0x12>
    80486ef:       f3 c3                   repz ret 
    

    So, yes, both approaches are the same.

    UPDATE: The same observation holds for other containers (or element types) such as vector<string> and map<string, string>. In those cases, it is especially important to use a reference in the ranged-based loop. Otherwise a temporary is created and lots of extra code appears (in the previous examples it was not needed since the vector contained just int values).

    For the case of map<string, string> the C++ code snippet used is:

    int foo1(const std::map<std::string, std::string>& v) {
        int res = 0;
        for (const auto& x : v) {
            res += (x.first.size() + x.second.size());
        }
        return res;
    }
    
    int foo2(const std::map<std::string, std::string>& v) {
        int res = 0;
        for (auto it = v.begin(), end = v.end(); it != end; ++it) {
            res += (it->first.size() + it->second.size());
        }
        return res;
    }
    

    And the assembly code (for both cases) is:

    8048d70:       56                      push   %esi
    8048d71:       53                      push   %ebx
    8048d72:       31 db                   xor    %ebx,%ebx
    8048d74:       83 ec 14                sub    $0x14,%esp
    8048d77:       8b 74 24 20             mov    0x20(%esp),%esi
    8048d7b:       8b 46 0c                mov    0xc(%esi),%eax
    8048d7e:       83 c6 04                add    $0x4,%esi
    8048d81:       39 f0                   cmp    %esi,%eax
    8048d83:       74 1b                   je     8048da0 
    8048d85:       8d 76 00                lea    0x0(%esi),%esi
    8048d88:       8b 50 10                mov    0x10(%eax),%edx
    8048d8b:       03 5a f4                add    -0xc(%edx),%ebx
    8048d8e:       8b 50 14                mov    0x14(%eax),%edx
    8048d91:       03 5a f4                add    -0xc(%edx),%ebx
    8048d94:       89 04 24                mov    %eax,(%esp)
    8048d97:       e8 f4 fb ff ff          call   8048990 <std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt>
    8048d9c:       39 c6                   cmp    %eax,%esi
    8048d9e:       75 e8                   jne    8048d88 
    8048da0:       83 c4 14                add    $0x14,%esp
    8048da3:       89 d8                   mov    %ebx,%eax
    8048da5:       5b                      pop    %ebx
    8048da6:       5e                      pop    %esi
    8048da7:       c3                      ret    
    
    0 讨论(0)
  • 2020-12-02 18:44

    It's possibly faster, in rare cases. Since you can't name the iterator, an optimizer can more easily prove that your loop cannot modify the iterator. This affects e.g. loop unrolling optimizations.

    0 讨论(0)
  • 2020-12-02 18:45

    No. It is same as the old for loop with iterators. After all, the range-based for works with iterators internally. The compiler just produces equivalent code for both.

    0 讨论(0)
  • 2020-12-02 19:08

    The Standard is your friend, see [stmt.ranged]/1

    For a range-based for statement of the form

    for ( for-range-declaration : expression ) statement
    

    let range-init be equivalent to the expression surrounded by parentheses

    ( expression )
    

    and for a range-based for statement of the form

    for ( for-range-declaration : braced-init-list ) statement
    

    let range-init be equivalent to the braced-init-list. In each case, a range-based for statement is equivalent to

    {
      auto && __range = range-init;
      for ( auto __begin = begin-expr,
                 __end = end-expr;
            __begin != __end;
            ++__begin )
      {
        for-range-declaration = *__begin;
        statement
      }
    }
    

    So yes, the Standard guarantees that the best possible form is achieved.

    And for a number of containers, such as vector, it is undefined behavior to modify (insert/erase) them during this iteration.

    0 讨论(0)
  • 2020-12-02 19:09

    Range-for is as fast as possible since it caches the end iterator[citation provided], uses pre-increment and only dereferences the iterator once.

    so if you tend to write:

    for(iterator i = cont.begin(); i != cont.end(); i++) { /**/ }
    

    Then, yes, range-for may be slightly faster, since it's also easier to write there's no reason not to use it (when appropriate).

    N.B. I said it's as fast as possible, it isn't however faster than possible. You can achieve the exact same performance if you write your manual loops carefully.

    0 讨论(0)
提交回复
热议问题