Most efficient way to assign values to maps

旧巷老猫 提交于 2019-12-21 03:44:21

问题


Which way to assign values to a map is most efficient? Or are they all optimized to the same code (on most modern compilers)?

   // 1) Assignment using array index notation
   Foo["Bar"] = 12345;

   // 2) Assignment using member function insert() and STL pair
   Foo.insert(std::pair<string,int>("Bar", 12345));

   // 3) Assignment using member function insert() and "value_type()"
   Foo.insert(map<string,int>::value_type("Bar", 12345));

   // 4) Assignment using member function insert() and "make_pair()"
   Foo.insert(std::make_pair("Bar", 12345));

(I know I could benchmark and check compiler output, but this question arose now and the only thing I have close at hand is my mobile phone... hehe)


回答1:


First, there are semantic differences between [] and insert:

  • [] will replace the old value (if any)
  • insert will ignore the new value (if an old value is already present)

therefore comparing the two is useless in general.

Regarding the inserts versions:

  • std::map<std::string, int>::value_type is std::pair<std::string const, int> so no (important) difference between 3 and 4
  • std::make_pair("Bar", 12345) is cheaper than std::pair<std::string, int>("Bar", 12345) because the std::string type is a full fledged class with operations to do on copy and "Bar" is just a string literal (thus just a pointer copy); however since at the end you do need to create the std::string... it will depend on the quality of your compiler

In general, I would recommend:

  • [] for updating
  • insert(std::make_pair()) for ignoring duplicates

std::make_pair is not only shorter, it also respects the DRY guideline: Don't Repeat Yourself.


For completeness though, the fastest (and easiest) would be emplace (C++11 enabled):

map.emplace("Bar", 12345);

Its behavior is that of insert, but it constructs the new element in-place.




回答2:


1) may be slightly slower than the other methods because std::map::operator[] first default-creates the object if it doesn't already exist, then returns a reference that you can use operator= on to set your desired value, i.e. two operations.

2-4) should be equivalent since map::value_type is a typedef to std::pair for the same types, and therefore make_pair is also equivalent. The compiler should treat these identically.

Also note that performance can be increased further if you need to both check for existence (e.g. to perform special logic depending on whether it exists or not) and then also insert it, by using map::lower_bound to first get a hint to where the element should be, so map::insert does not have to search the whole map again:

 // get the iterator to where the key *should* be if it existed:
 std::map::iterator hint = mymap.lower_bound(key);

 if (hint == mymap.end() || mymap.key_comp()(key, hint->first)) {
     // key didn't exist in map
     // special logic A here...

     // insert at the correct location
     mymap.insert(hint, make_pair(key, new_value));
 } else { 
     // key exists in map already
     // special logic B here...

     // just update value
     hint->second = new_value;
 }



回答3:


Your first possibility: Foo["Bar"] = 12345; has somewhat different semantics than the others -- it'll insert a new object if none exists (like the others) but overwrite the current contents if none exists (where the others using insert will fail if that key is already present).

As far as speed goes, it has the potential to be slower than the others. When you're inserting a new object, it has create a pair with the specified key and a default-constructed value_type, then assign the correct value_type afterwards. The others all construct the pair the correct key and value and insert that object. Being fair, however, my experience is that the difference is rarely significant (with older compilers, it was more significant, but with newer ones pretty minimal).

The next two are equivalent to each other. You're just using two different ways to name the same type. By run-time, there's no difference between them at all.

The fourth uses a template function (make_pair) that theoretically could involve an extra level of function call. I'd be quite surprised to see a real difference from this though, except (possibly) if you were careful to ensure that the compiler did absolutely no optimization (especially inlining) at all.

Bottom line: The first will often be a little slower than the rest (but not always and not by much). The other three will almost always be equal (as in: normally expect any reasonable compiler to produce identical code for all three) even though there's theoretical justification for the fourth being slower.




回答4:


If there is no object at that key location, then:

std::map::emplace is most efficient. insert is second (but will be extremely close). [] is least efficient.

[], if there is no object there, trivial constructs one. It then calls operator=.

insert does a copy constructor call on the std::pair argument.

However, in the case of maps, map.insert( make_pair( std::move(key), std::move(value) ) ) is going to be close to map.emplace( std::move(key), std::move(value) ).

If there is an object at the key location, then [] will call operator=, while insert/emplace will destroy the old object and create a new one. [] could easily be cheaper in this case.

In the end, it depends on what your operator= vs copy-construct vs trivial-construct vs destructor costs are for your key and value.

The actual work looking things up within the tree structure of the std::map will be so close to identical it isn't funny.




回答5:


The third one is the best choice (IMHO), but 2, 3 and 4 are equal.

// 3) Assignment using member function insert() and "value_type()"
Foo.insert(map<string,int>::value_type("Bar", 12345));

Why I think the third one is the best choice: You're performing only one operation to insert the value: just inserting (well, there's a search too) and you can know if the value was inserted, checking the second member of the return value and the implementation grants to not overwrite the value.

The use of the value_type has advantages too: you don't need to know the mapped type or key type so is useful with template programming.

The worst (IMHO) is the first one:

// 1) Assignment using array index notation
Foo["Bar"] = 12345;

You're calling the std::map::operator[] wich creates an object and returns a reference to it, then the mapped object operator = is called. You're doing two operations for the insertion: first the insertion, second the asignation.

And it have another problem: you don't know if the value has been inserted or overwrited.




回答6:


Even though there has been a couple of good answers already I thought I might as well do a quick benchmark. Ran each one 5 million times and used c++11's chrono to measure the time it took.

Heres the code:

#include <string>
#include <map>
#include <chrono>
#include <cstdio>

// 5 million
#define times 5000000

int main()
{
    std::map<std::string, int> foo1, foo2, foo3, foo4, foo5;
    std::chrono::steady_clock::time_point timeStart, timeEnd;
    int x = 0;

    // 1) Assignment using array index notation
    timeStart = std::chrono::steady_clock::now();
    for (x = 0; x <= times; x++)
    {
        foo1[std::to_string(x)] = 12345;
    }
    timeEnd = std::chrono::steady_clock::now();
    printf("1) took %i milliseconds\n", (unsigned long long)std::chrono::duration_cast<std::chrono::milliseconds>(timeEnd-timeStart).count());

    // 2) Assignment using member function insert() and STL pair
    timeStart = std::chrono::steady_clock::now();
    for (x = 0; x <= times; x++)
    {
        foo2.insert(std::pair<std::string, int>(std::to_string(x), 12345));
    }
    timeEnd = std::chrono::steady_clock::now();
    printf("2) took %i milliseconds\n", (unsigned long long)std::chrono::duration_cast<std::chrono::milliseconds>(timeEnd-timeStart).count());

    // 3) Assignment using member function insert() and "value_type()"
    timeStart = std::chrono::steady_clock::now();
    for (x = 0; x <= times; x++)
    {
        foo3.insert(std::map<std::string, int>::value_type(std::to_string(x), 12345));
    }
    timeEnd = std::chrono::steady_clock::now();
    printf("3) took %i milliseconds\n", (unsigned long long)std::chrono::duration_cast<std::chrono::milliseconds>(timeEnd-timeStart).count());

    // 4) Assignment using member function insert() and "make_pair()"
    timeStart = std::chrono::steady_clock::now();
    for (x = 0; x <= times; x++)
    {
        foo4.insert(std::make_pair(std::to_string(x), 12345));
    }
    timeEnd = std::chrono::steady_clock::now();
    printf("4) took %i milliseconds\n", (unsigned long long)std::chrono::duration_cast<std::chrono::milliseconds>(timeEnd-timeStart).count());

    // 5) Matthieu M.'s suggestion of C++11's emplace
    timeStart = std::chrono::steady_clock::now();
    for (x = 0; x <= times; x++)
    {
        foo5.emplace(std::to_string(x), 12345);
    }
    timeEnd = std::chrono::steady_clock::now();
    printf("5) took %i milliseconds\n", (unsigned long long)std::chrono::duration_cast<std::chrono::milliseconds>(timeEnd-timeStart).count());

    return 0;
}

The output for 5 million iterations is:

1) took 23448 milliseconds
2) took 22854 milliseconds
3) took 22372 milliseconds
4) took 22988 milliseconds
5) took 21356 milliseconds

GCC version:

g++ (Built by MinGW-builds project) 4.8.0 20121225 (experimental)

My machine:

Intel i5-3570k overclocked at 4.6 GHz

EDIT1: Fixed the code and redid the benchmark.

EDIT2: Added Matthieu M.'s suggestion for C++11's emplace and he is right, emplace is fastest



来源:https://stackoverflow.com/questions/14218042/most-efficient-way-to-assign-values-to-maps

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!