std::hash value on char* value and not on memory address?

筅森魡賤 提交于 2021-02-08 12:37:14

问题


As stated in this link:

There is no specialization for C strings. std::hash produces a hash of the value of the pointer (the memory address), it does not examine the contents of any character array.

Which means that with the same char* value, different hashcodes could be produced. For example, having this code:

//MOK and MOV are template arguments
void emit(MOK key, MOV value) {
    auto h = hash<MOK>()(key);
    cout<<"key="<<key<<" h="<<h<<endl;
    ...

This is the output produced by calling 4 times emit() on the same key (with MOK=char*) value (but 4 different tokens/string objects):

key=hello h=140311481289184
key=hello h=140311414180320
key=hello h=140311414180326
key=hello h=140311481289190

How can I obtain the same hash code for char*? I'd prefer not to use boost


回答1:


There is of course the trivial (and slow) solution of creating a temporary std::string and hashing that one. If you don't want to do this, I'm afraid you will have to implement your own hash function. Sadly enough, the current C++ standard library doesn't provide general purpose hash algorithms disentangled from object-specific hash solutions. (But there is some hope this could change in the future.)

Suppose you had a function

std::size_t
hash_bytes(const void * data, std::size_t size) noexcept;

that would take an address and a size and return you a hash computed from the that many bytes following that address. With the help of that function, you could easily write

template <typename T>
struct myhash
{
  std::size_t
  operator()(const T& obj) const noexcept
  {
    // Fallback implementation.
    auto hashfn = std::hash<T> {};
    return hashfn(obj);
  }
};

and then specialize it for the types you're interested in.

template <>
struct myhash<std::string>
{
  std::size_t
  operator()(const std::string& s) const noexcept
  {
    return hash_bytes(s.data(), s.size());
  }
};

template <>
struct myhash<const char *>
{
  std::size_t
  operator()(const char *const s) const noexcept
  {
    return hash_bytes(s, std::strlen(s));
  }
};

This leaves you only with the exercise of implementing hash_bytes. Fortunately, there are some fairly good hash functions that are rather easy to implement. My go-to algorithm for simple hashing is the Fowler-Noll-Vo hash function. You can implement it in five lines of code; see the linked Wikipedia article.

If you want to get a bit fancy, consider the following implementation. First, I define a generic template that can be specialized for any version of the FNV-1a hash function.

template <typename ResultT, ResultT OffsetBasis, ResultT Prime>
class basic_fnv1a final
{

  static_assert(std::is_unsigned<ResultT>::value, "need unsigned integer");

public:

  using result_type = ResultT;

private:

  result_type state_ {};

public:

  constexpr
  basic_fnv1a() noexcept : state_ {OffsetBasis}
  {
  }

  constexpr void
  update(const void *const data, const std::size_t size) noexcept
  {
    const auto cdata = static_cast<const unsigned char *>(data);
    auto acc = this->state_;
    for (auto i = std::size_t {}; i < size; ++i)
      {
        const auto next = std::size_t {cdata[i]};
        acc = (acc ^ next) * Prime;
      }
    this->state_ = acc;
  }

  constexpr result_type
  digest() const noexcept
  {
    return this->state_;
  }

};

Next, I provide aliases for the 32 and 64 bit versions. The parameters were taken from Landon Curt Noll's website.

using fnv1a_32 = basic_fnv1a<std::uint32_t,
                             UINT32_C(2166136261),
                             UINT32_C(16777619)>;

using fnv1a_64 = basic_fnv1a<std::uint64_t,
                             UINT64_C(14695981039346656037),
                             UINT64_C(1099511628211)>;

Finally, I provide type meta-functions to select a version of the algorithm given the wanted number of bits.

template <std::size_t Bits>
struct fnv1a;

template <>
struct fnv1a<32>
{
  using type = fnv1a_32;
};

template <>
struct fnv1a<64>
{
  using type = fnv1a_64;
};

template <std::size_t Bits>
using fnv1a_t = typename fnv1a<Bits>::type;

And with that, we're good to go.

constexpr std::size_t
hash_bytes(const void *const data, const std::size_t size) noexcept
{
  auto hashfn = fnv1a_t<CHAR_BIT * sizeof(std::size_t)> {};
  hashfn.update(data, size);
  return hashfn.digest();
}

Note how this code automatically adapts to platforms where std::size_t is 32 or 64 bits wide.




回答2:


I've had to do this before and ended up writing a function to do this, with essentially the same implementation as Java's String hash function:

size_t hash_c_string(const char* p, size_t s) {
    size_t result = 0;
    const size_t prime = 31;
    for (size_t i = 0; i < s; ++i) {
        result = p[i] + (result * prime);
    }
    return result;
}

Mind you, this is NOT a cryptographically secure hash, but it is fast enough and yields good results.




回答3:


In C++17 you should use std::hash<std::string_view> which works seamlessly since const char* can be implicitly converted to it.




回答4:


Since C++17 added std::string_view including a std::hash specialization for it you can use that to compute the hash value of a C-string.

Example:

#include <string_view>
#include <cstring>

static size_t hash_cstr(const char *s)
{
    return std::hash<std::string_view>()(std::string_view(s, std::strlen(s)));
}

If you have to deal with a pre-C++17 compiler you can check your STL for an implementation defined hash function and call that.

For example, libstdc++ (which is what GCC uses by default) provides std::_Hash_bytes which can be called like this:

#include <functional>
// -> which finally includes /usr/include/c++/$x/bits/hash_bytes.h
#include <cstring>

static size_t hash_cstr_gnu(const char *s)
{
    const size_t seed = 0;
    return std::_Hash_bytes(s, std::strlen(s), seed);
}



来源:https://stackoverflow.com/questions/34597260/stdhash-value-on-char-value-and-not-on-memory-address

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!