Constructing a hash table/hash function

你说的曾经没有我的故事 提交于 2019-12-07 06:54:05

问题


I would like to construct a hash table that looks up keys in sequences (strings) of bytes ranging from 1 to 15 bytes.

I would like to store an integer value, so I imagine an array for hashing would suffice. I'm having difficulty conceptualizing how to construct a hash function such that given the key would give an index into the array.

Any assistance would be much appreiated.

The maximum number of entries in the hash is: 4081*15 + 4081*14 + ... 4081 = 4081((15*(16))/2) = 489720.

So for example:

int table[489720];

int lookup(unsigned char *key)
{
    int index = hash(key);
    return table[index];
}

What are some good choices for a hash function, or how would I go about constructing one?

Thanks.


回答1:


To hash C strings, I've always used this function (take the result % your hash table's size):

int hashstring(const char* s) {
  int key = 0;
  while (*s) {
    key = key*37 + *s++;
  }
  return key;
}

I don't remember where I got it from initially, but in many years it hasn't let me down.




回答2:


Your key space is large (approx 2^(8*15)), so if you want a perfect hash, you will need to know what 489720 actual keys will show up in advance. Even then, it is practically impossible to find a perfect hash for those keys, even if you allowed a much larger table (a.k.a. a very low load factor). The only way I know to find a perfect hash is by trial and error, and a random hash is likely to fail unless your table has close to 489720^2 entries.

I highly recommend using a regular (non-perfect) hash and deal with collisions appropriately, e.g. with chaining:

struct entry {
  unsigned char *key;
  int value;
  struct entry *next;
} *table[1<<20];
int lookup(unsigned char *key) {
  int index = hash(key) % (1<<20);
  for (struct entry *e = table[index]; e != NULL; e = e->next) {
    if (!strcmp(key, e->key)) return e->value;
  }
  // not found
}

I also recommend you don't implement this yourself - use a standard library like a c++ hashmap.




回答3:


If you want a perfect hash, then you can start by reading the Wikipedia article on perfect hashing. If you run into snags , you can ask for help here.




回答4:


If the the average number of strings resident in the table is low--like under 10,000 entries--an associative array would be a reasonable approach, even using a linear search if it's on a modern CPU architecture.

Otherwise, constructing a "perfect hash" requires inspecting each character of the string and computing a unique value based on the possible range. For example, if only the 26 characters A..Z are allowed in the key, this would work:

int
hash (const char *key)
{
   int h = 0;
   while (key && *key)
       h = h * 26 + (*key++ - 'A');
   return h;
}


来源:https://stackoverflow.com/questions/2962207/constructing-a-hash-table-hash-function

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!