问题
The title may be too general. I am benchmarking the following 2 statements on a large vector<unsigned> v
:
sort(v.begin(), v.end(), l);
sort(v.begin(), v.end(), [](unsigned a, unsigned b) { return l(a, b); });
where l
is defined as
bool l(unsigned a, unsigned b) { return a < b; }
The result surprises me: the second is as fast as sort(v.begin(), v.end());
or sort(v.begin(), v.end(), std::less<>());
while the first is significantly slower.
My question is why wrapping the function in a lambda speeds up the program.
Moreover, sort(v.begin(), v.end(), [](unsigned a, unsigned b) { return l(b, a); });
is as fast, too.
Related code:
#include <iostream>
#include <vector>
#include <chrono>
#include <random>
#include <functional>
#include <algorithm>
using std::cout;
using std::endl;
using std::vector;
bool l(unsigned a, unsigned b) { return a < b; };
int main(int argc, char** argv)
{
auto random = std::default_random_engine();
vector<unsigned> d;
for (unsigned i = 0; i < 100000000; ++i)
d.push_back(random());
auto t0 = std::chrono::high_resolution_clock::now();
std::sort(d.begin(), d.end());
auto t1 = std::chrono::high_resolution_clock::now();
cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << endl;
d.clear();
for (unsigned i = 0; i < 100000000; ++i)
d.push_back(random());
t0 = std::chrono::high_resolution_clock::now();
std::sort(d.begin(), d.end(), l);
t1 = std::chrono::high_resolution_clock::now();
cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << endl;
d.clear();
for (unsigned i = 0; i < 100000000; ++i)
d.push_back(random());
t0 = std::chrono::high_resolution_clock::now();
std::sort(d.begin(), d.end(), [](unsigned a, unsigned b) {return l(a, b); });
t1 = std::chrono::high_resolution_clock::now();
cout << std::chrono::duration_cast<std::chrono::nanoseconds>(t1 - t0).count() << endl;
return 0;
}
Tested on both g++ and MSVC.
Update:
I found that the lambda version generate exactly same assembly code as default one (sort(v.begin(), v.end())
), while the one using a function is different. But I do not know assembly and thus can't do more.
回答1:
sort
is potentially a big function, so it's usually not inlined. Therefore, it is compiled alone. Consider sort
:
template <typename RanIt, typename Pred>
void sort(RanIt, RanIt, Pred)
{
}
If Pred
is bool (*)(unsigned, unsigned)
, there is no way to inline the function — a function pointer type cannot uniquely identify a function. There is only one sort<It, It, bool (*)(unsigned, unsigned)>
, and it is invoked by all calls with different function pointers. The user passes l
to the function, but that's just processed as an ordinary argument. It is therefore impossible to inline the call.
If Pred
is a lambda, it is trivial to inline the function call — the lambda type uniquely identifies a function. Every call to this instantiation of sort
invoke the same (lambda) function, so we don't have the problem for function pointers. The lambda itself contains a direct call to l
, which is also easy to inline. Therefore, the compiler inlines all function calls and generate the same code as a no-predicate sort
.
The case with a function closure type (std::less<>
) is similar: the behavior of calling a std::less<>
is fully known when compiling sort
, so inlining is trivial.
来源:https://stackoverflow.com/questions/57830971/why-wrapping-a-function-into-a-lambda-potentially-make-the-program-faster