I don\'t have experience with hash tables outside of arrays/dictionaries in dynamic languages, so I recently found out that internally they\'re implemented by making a hash
In some cases, it's possible that the key is very long or large, making it impractical to keep copies of these keys. Hashing them first allows for less memory usage as well as quicker lookup times.
What I don't understand is why aren't the values stored with the key (string, number, whatever) as the, well, key, instead of making a hash of it and storing that.
Well, how do you propose to do that, with O(1) lookup?
The point of hashtables is basically to provide O(1) lookup by turning the key into an array index and then returning the content of the array at that index. To make that possible for arbitrary keys you need
In a nutshell: Hashing allows O(1) queries/inserts/deletes to the table. OTOH, a sorted structure (usually implemented as a balanced BST) makes the same operations take O(logn) time.
Why take a hash, you ask? How do you propose to store the key "as the key"? Ask yourself this, if you plan to store simply (key,value) pairs, how fast will your lookups/insertions/deletions be? Will you be running a O(n) loop over the entire array/list?
The whole point of having a hash value is that it allows all keys to be transformed into a finite set of hash values. This allows us to store keys in slots of a finite array (enabling fast operations - instead of searching the whole list you only search those keys that have the same hash value) even though the set of possible keys may be extremely large or infinite (e.g. keys can be strings, very large numbers, etc.) With a good hash function, very few keys will ever have the same hash values, and all operations are effectively O(1).
This will probably not make much sense if you are not familiar with hashing and how hashtables work. The best thing to do in that case is to consult the relevant chapter of a good algorithms/data structures book (I recommend CLRS).
The main advantage of using a hash for the purpose of finding items in the table, as opposed to using the original key of the key-value pair (which BTW, it typically stored in the table as well, since the hash is not reversible), is that..
...it allows mapping the whole namespace of the [original] keys to the relatively small namespace of the hash values, allowing the hash-table to provide O(1) performance for retrieving items.
This O(1) performance gets a bit eroded when considering the extra time to dealing with collisions and such, but on the whole the hash table is very fast for storing and retrieving items, as opposed to a system based solely on the [original] key value, which would then typically be O(log N), with for example a binary tree (although such tree is more efficient, space-wise)
The idea of a hash table is to provide a direct access to its items. So that is why the it calculates the "hash code" of the key and uses it to store the item, insted of the key itself.
The idea is to have only one hash code per key. Many times the hash function that generates the hash code is to divide a prime number and uses its remainer as the hash code.
For example, suppose you have a table with 13 positions, and an integer as the key, so you can use the following hash function
f(x) = x % 13