Best hashing algorithm in terms of hash collisions and performance for strings

前端未结

关注

 9  1068

忘掉有多难

What would be the best hashing algorithm if we had the following priorities (in that order):

Minimal hash collisions
Performance

It doe

相关标签:

9条回答

礼貌的吻别

2020-11-28 03:58

You can get both using the Knuth hash function described here.

It's extremely fast assuming a power-of-2 hash table size -- just one multiply, one shift, and one bit-and. More importantly (for you) it's great at minimizing collisions (see this analysis).

Some other good algorithms are described here.

0 讨论(0)
发布评论:

提交评论
- 加载中...
谎友^

2020-11-28 04:00

Here is a straightforward way of implementing it yourself: http://www.devcodenote.com/2015/04/collision-free-string-hashing.html

Here is a snippet from the post:

if say we have a character set of capital English letters, then the length of the character set is 26 where A could be represented by the number 0, B by the number 1, C by the number 2 and so on till Z by the number 25. Now, whenever we want to map a string of this character set to a unique number , we perform the same conversion as we did in case of the binary format

0 讨论(0)
发布评论:

提交评论
- 加载中...

予麋鹿

2020-11-28 04:01

The simple hashCode used by Java's String class might show a suitable algorithm.

Below is the "GNU Classpath" implementation. (License: GPL)

  /**
   * Computes the hashcode for this String. This is done with int arithmetic,
   * where ** represents exponentiation, by this formula:<br>
   * <code>s[0]*31**(n-1) + s[1]*31**(n-2) + ... + s[n-1]</code>.
   *
   * @return hashcode value of this String
   */
  public int hashCode()
  {
    if (cachedHashCode != 0)
      return cachedHashCode;

    // Compute the hash code using a local variable to be reentrant.
    int hashCode = 0;
    int limit = count + offset;
    for (int i = offset; i < limit; i++)
      hashCode = hashCode * 31 + value[i];
    return cachedHashCode = hashCode;
  }

0 讨论(0)

上一页 1 2