Implement a hash table | 易学教程

问题

I'm trying to create an efficient look-up table in C.

I have an integer as a key and a variable length char* as the value.

I've looked at uthash, but this requires a fixed length char* value. If I make this a big number, then I'm using too much memory.

struct my_struct {
    int key;
    char value[10];             
    UT_hash_handle hh;
};

Has anyone got any pointers? Any insight greatly appreciated.

Thanks everyone for the answers. I've gone with uthash and defined my own custom struct to accommodate my data.

回答1:

Declare the value field as void *value.

This way you can have any type of data as the value, but the responsibility for allocating and freeing it will be delegated to the client code.

回答2:

You first have to think of your collision strategy:

Will you have multiple hash functions?
Or will you have to use containers inside of the hashtable?

We'll pick 1.

Then you have to choose a nicely distributed hash function. For the example, we'll pick

int hash_fun(int key, int try, int max) {
    return (key + try) % max;
}

If you need something better, maybe have a look at the middle-squared method.

Then, you'll have to decide, what a hash table is.

struct hash_table {
    int max;
    int number_of_elements;
    struct my_struct **elements;
};

Then, we'll have to define how to insert and to retrieve.

int hash_insert(struct my_struct *data, struct hash_table *hash_table) {
    int try, hash;
    if(hash_table->number_of_elements >= hash_table->max) {
        return 0; // FULL
    }
    for(try = 0; true; try++) {
        hash = hash_fun(data->key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) { // empty cell
            hash_table->elements[hash] = data;
            hash_table->number_of_elements++;
            return 1;
        }
    }
    return 0;
}

struct my_struct *hash_retrieve(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            return hash_table->elements[hash];
        }
    }
    return 0;
}

And least a method to remove:

int hash_delete(int key, struct hash_table *hash_table) {
    int try, hash;
    for(try = 0; true; try++) {
        hash = hash_fun(key, try, hash_table->max);
        if(hash_table->elements[hash] == 0) {
            return 0; // Nothing found
        }
        if(hash_table->elements[hash]->key == key) {
            hash_table->number_of_elements--;
            hash_table->elements[hash] = 0;
            return 1; // Success
        }
    }
    return 0;
}

回答3:

It really depends on the distribution of your key field. For example, if it's a unique value always between 0 and 255 inclusive, just use key % 256 to select the bucket and you have a perfect hash.

If it's equally distributed across all possible int values, any function which gives you an equally distributed hash value will do (such as the afore-mentioned key % 256) albeit with multiple values in each bucket.

Without knowing the distribution, it's a little hard to talk about efficient hashes.

来源：https://stackoverflow.com/questions/6844739/implement-a-hash-table

标签

hashtable