C++ some questions on boost::unordered_map & boost::hash

荒凉一梦 提交于 2019-12-01 04:07:56

This is a bit muddled.

  • What you say are not "things that you can do to speed things up"; rather, they are mandatory requirements of your type to be eligible as the element type of an unordered map, and also for an unordered set (which you might rather want).

  • You need to provide an equality operator that compares objects, not hash values. The whole point of the equality is to distinguish elements with the same hash.

  • size_t is an unsigned integral type, 32 bits on x86 and 64 bits on x64. Since you want "billions of elements", which means many gigabytes of data, I assume you have a solid x64 machine anyway.

  • What's crucial is that your hash function is good, i.e. has few collisions.

  • You want a set, not a map. Put the objects directly in the set: std::unordered_set<State>. Use a map if you are mapping to something, i.e. states to something else. Oh, use C++0x, not boost, if you can.

  • Using hash_combine is good.


Baby example:

struct State
{
  inline bool operator==(const State &) const;
  /* Stuff */
};

namespace std
{
  template <> struct hash<State>
  {
    inline std::size_t operator()(const State & s) const
    {
      /* your hash algorithm here */
    }
  };
}

std::size_t Foo(const State & s) { /* some code */ }

int main()
{
  std::unordered_set<State> states; // no extra data needed
  std::unordered_set<State, Foo> states; // another hash function
}

An unordered_map is a hashtable. You don't store the hash; it is done internally as the storage and lookup method.

Given your requirements, an unordered_set might be more appropriate, since your object is the only item to store.

You are a little confused though -- the equality operator and hash function are not truly performance items, but required for nontrivial objects for the container to work correctly. A good hash function will distribute your nodes evenly across the buckets, and the equality operator will be used to remove any ambiguity about matches based on the hash function.

std::size_t is fine for the hash function. Remember that no hash is perfect; there will be collisions, and these collision items are stored in a linked list at that bucket position.

Thus, .find() will be O(1) in the optimal case and very close to O(1) in the average case (and O(N) in the worst case, but a decent hash function will avoid that.)

You don't mention your platform or architecture; at billions of entries you still might have to worry about out-of-memory situations depending on those and the size of your State object.

forget about hash; there is nothing (at least from your question) that suggests you have a meaningful key;

lets take a step back and rephrase your actual performance goals:

  • you want to quickly validate no duplicates ever exist for any of your State objects

comment if i need to add others.

From the aforementioned goal, and from your comment i would suggest you use actually a ordered_set rather than an unordered_map. Yes, the ordered search uses binary search O(log (n)) while unordered uses lookup O(1).

However, the difference is that with this approach you need the ordered_set ONLY to check that a similar state doesn't exist already when you are about to create a new one, that is, at State creation-time.

In all the other lookups, you actually don't need to look into the ordered_set! because you already have the key; State*, and the key can access the value by the magic dereference operator: *key

so with this approach, you only are using the ordered_set as an index to verify States on creation time only. In all the other cases, you access your State with the dereference operator of your pointer-value key.

if all the above wasn't enough to convince you, here is the final nail in the coffin of the idea of using a hash to quickly determine equality; hash function has a small probability of collision, but as the number of states will grow, that probability will become complete certainty. So depending on your fault-tolerance, you are going to deal with state collisions (and from your question and the number of States you are expecting to deal, it seems you will deal with a lot of them)

For this to work, you obviously need the compare predicate to test for all the internal properties of your state (giroscope, thrusters, accelerometers, proton rays, etc.)

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!