Reading various questions here on Stack Overflow about C++ iterators and performance**, I started wondering if for(auto& elem : container)
gets \"expan
Out of curiosity I decided to look at the assembly code for both approaches:
int foo1(const std::vector<int>& v) {
int res = 0;
for (auto x : v)
res += x;
return res;
}
int foo2(const std::vector<int>& v) {
int res = 0;
for (std::vector<int>::const_iterator it = v.begin(); it != v.end(); ++it)
res += *it;
return res;
}
And the assembly code (with -O3 and gcc 4.6) is exactly the same for both approaches (code for foo2
is omitted, since it is exactly the same):
080486d4 <foo1(std::vector<int, std::allocator<int> > const&)>:
80486d4: 8b 44 24 04 mov 0x4(%esp),%eax
80486d8: 8b 10 mov (%eax),%edx
80486da: 8b 48 04 mov 0x4(%eax),%ecx
80486dd: b8 00 00 00 00 mov $0x0,%eax
80486e2: 39 ca cmp %ecx,%edx
80486e4: 74 09 je 80486ef <foo1(std::vector<int, std::allocator<int> > const&)+0x1b>
80486e6: 03 02 add (%edx),%eax
80486e8: 83 c2 04 add $0x4,%edx
80486eb: 39 d1 cmp %edx,%ecx
80486ed: 75 f7 jne 80486e6 <foo1(std::vector<int, std::allocator<int> > const&)+0x12>
80486ef: f3 c3 repz ret
So, yes, both approaches are the same.
UPDATE: The same observation holds for other containers (or element types) such as vector<string>
and map<string, string>
. In those cases, it is especially important to use a reference in the ranged-based loop. Otherwise a temporary is created and lots of extra code appears (in the previous examples it was not needed since the vector
contained just int
values).
For the case of map<string, string>
the C++ code snippet used is:
int foo1(const std::map<std::string, std::string>& v) {
int res = 0;
for (const auto& x : v) {
res += (x.first.size() + x.second.size());
}
return res;
}
int foo2(const std::map<std::string, std::string>& v) {
int res = 0;
for (auto it = v.begin(), end = v.end(); it != end; ++it) {
res += (it->first.size() + it->second.size());
}
return res;
}
And the assembly code (for both cases) is:
8048d70: 56 push %esi
8048d71: 53 push %ebx
8048d72: 31 db xor %ebx,%ebx
8048d74: 83 ec 14 sub $0x14,%esp
8048d77: 8b 74 24 20 mov 0x20(%esp),%esi
8048d7b: 8b 46 0c mov 0xc(%esi),%eax
8048d7e: 83 c6 04 add $0x4,%esi
8048d81: 39 f0 cmp %esi,%eax
8048d83: 74 1b je 8048da0
8048d85: 8d 76 00 lea 0x0(%esi),%esi
8048d88: 8b 50 10 mov 0x10(%eax),%edx
8048d8b: 03 5a f4 add -0xc(%edx),%ebx
8048d8e: 8b 50 14 mov 0x14(%eax),%edx
8048d91: 03 5a f4 add -0xc(%edx),%ebx
8048d94: 89 04 24 mov %eax,(%esp)
8048d97: e8 f4 fb ff ff call 8048990 <std::_Rb_tree_increment(std::_Rb_tree_node_base const*)@plt>
8048d9c: 39 c6 cmp %eax,%esi
8048d9e: 75 e8 jne 8048d88
8048da0: 83 c4 14 add $0x14,%esp
8048da3: 89 d8 mov %ebx,%eax
8048da5: 5b pop %ebx
8048da6: 5e pop %esi
8048da7: c3 ret
It's possibly faster, in rare cases. Since you can't name the iterator, an optimizer can more easily prove that your loop cannot modify the iterator. This affects e.g. loop unrolling optimizations.
No. It is same as the old for
loop with iterators. After all, the range-based for
works with iterators internally. The compiler just produces equivalent code for both.
The Standard is your friend, see [stmt.ranged]/1
For a range-based for statement of the form
for ( for-range-declaration : expression ) statement
let range-init be equivalent to the expression surrounded by parentheses
( expression )
and for a range-based for statement of the form
for ( for-range-declaration : braced-init-list ) statement
let range-init be equivalent to the braced-init-list. In each case, a range-based
for
statement is equivalent to{ auto && __range = range-init; for ( auto __begin = begin-expr, __end = end-expr; __begin != __end; ++__begin ) { for-range-declaration = *__begin; statement } }
So yes, the Standard guarantees that the best possible form is achieved.
And for a number of containers, such as vector
, it is undefined behavior to modify (insert/erase) them during this iteration.
Range-for is as fast as possible since it caches the end iterator[citation provided], uses pre-increment and only dereferences the iterator once.
so if you tend to write:
for(iterator i = cont.begin(); i != cont.end(); i++) { /**/ }
Then, yes, range-for may be slightly faster, since it's also easier to write there's no reason not to use it (when appropriate).
N.B. I said it's as fast as possible, it isn't however faster than possible. You can achieve the exact same performance if you write your manual loops carefully.