问题
It is follow-up question for this MIC question. When adding items to the vector of reference wrappers I spend about 80% of time inside ++ operator whatever iterating approach I choose.
The query works as following
VersionView getVersionData(int subdeliveryGroupId, int retargetingId,
const std::wstring &flightName) const {
VersionView versions;
for (auto i = 0; i < 3; ++i) {
for (auto j = 0; j < 3; ++j) {
versions.insert(m_data.get<mvKey>().equal_range(boost::make_tuple(subdeliveryGroupId + i, retargetingId + j,
flightName)));
}
}
return versions;
}
I've tried following ways to fill the reference wrapper
template <typename InputRange> void insert(const InputRange &rng) {
// 1) base::insert(end(), rng.first, rng.second); // 12ms
// 2) std::copy(rng.first, rng.second, std::back_inserter(*this)); // 6ms
/* 3) size_t start = size(); // 12ms
auto tmp = std::reference_wrapper<const
VersionData>(VersionData(0,0,L""));
resize(start + boost::size(rng), tmp);
auto beg = rng.first;
for (;beg != rng.second; ++beg, ++start)
{
this->operator[](start) = std::reference_wrapper<const VersionData>(*beg);
}
*/
std::copy(rng.first, rng.second, std::back_inserter(*this));
}
Whatever I do I pay for operator ++ or the size method which just increments the iterator - meaning I'm still stuck in ++. So the question is if there is a way to iterate result ranges faster. If there is no such a way is it worth to try and go down the implementation of equal_range adding new argument which holds reference to the container of reference_wrapper which will be filled with results instead of creating range?
EDIT 1: sample code
http://coliru.stacked-crooked.com/a/8b82857d302e4a06/
Due to this bug it will not compile on Coliru
EDIT 2: Call tree, with time spent in operator ++
回答1:
The fact that 80% of getVersionData
´s execution time is spent in operator++
is not indicative of any performance problem per se --at most, it tells you that equal_range
and std::reference_wrapper
insertion are faster in comparison. Put another way, when you profile some piece of code you will typically find locations where the most time is spent, but whether this is a problem or not depends on the required overall performance.
回答2:
@kreuzerkrieg, your sample code does not exercise any kind of insertion into a vector
of std::reference_wrapper
s! Instead, you're projecting the result of equal_range
into a boost::any_range
, which is expected to be fairly slow at iteration --basically, increment ops resolve to virtual calls.
So, unless I'm seriously missing something here, the sample code performance or lack thereof does not have anything to do with whatever your problem is in real code (assuming VersionView
, of which you don't show the code, is not using boost::any_range
).
That said, if you can afford replacing your ordered indices with equivalent hashed indices, iteration will probably be faster, but this is is an utter shot in the dark given you're not showing the real stuff.
回答3:
I think that you're measuring the wrong things entirely. When I scale up from 3x3x11111 to 10x10x111111 (so 111x as many items in the index), it still runs in 290ms.
And populating the stuff takes orders of magnitude more time. Even deallocating the container appears to take more time.
What Doesn't Matter?
I've contributed a version with some trade offs, which mainly show that there's no sense in tweaking things: View On Coliru
- there's a switch to avoid the
any_range
(it doesn't make sense using that if you care for performance) there's a switch to tweak the flyweight:
#define USE_FLYWEIGHT 0 // 0: none 1: full 2: no tracking 3: no tracking no locking
again, it merely shows you could easily do without, and should consider doing so unless you need the memory optimization for the string (?). If so, consider using the
OPTIMIZE_ATOMS
approach:the
OPTIMIZE_ATOMS
basically does fly weight forwstring
there. Since all the strings are repeated here it will be mighty storage efficient (although the implementation is quick and dirty and should be improved). The idea is much better applied here: How to improve performance of boost interval_map lookups
Here's some rudimentary timings:
As you can see, basically nothing actually matters for query/iteration performance
Any Iterators: Doe They Matter?
It might be the culprit on your compiler. On my compile (gcc 4.8.2) it wasn't anything big, but see the disassembly of the accumulate loop without the any iterator:
As you can see from the sections I've highlighted, there doesn't seem to be much fat from the algorithm, the lambda nor from the iterator traversal. Now with the any_iterator the situation is much less clear, and if your compile optimizes less well, I can imagine it failing to inline elementary operations making iteration slow. (Just guessing a little now)
回答4:
Ok, so the solution I applied is as following: in addition to the odered_non_unique index (the 'byKey') I've added random_access index. When the data is loaded I rearrange the random index with m_data.get.begin(). Then when the MIC is queried for the data I just do boost::equal_range on the random index with custom predicate which emulates the same logic that was applied in ordering of 'byKey' index. That's it, it gave me fast 'size()' (O(1), as I understand) and fast traversal. Now I'm ready for your rotten tomatoes :)
EDIT 1: of course I've changed the any_range from bidirectional traversal tag to the random access one
来源:https://stackoverflow.com/questions/27588018/boost-multi-index-container-and-slow-operator