Supposing simple uniform hashing, that being, any given value is equally like to hash into any of the slots of the hash. Why is it better to use a table of size 127 and not 128?
"When using the division method, we usually avoid certain values of m (table size). For example, m should not be a power of
2
, since if m =2p
, thenh(k)
is just thep
lowest-order bits ofk
."--CLRS
To understand why m = 2p
uses only the p
lowest bits of k
, you must first understand the modulo hash function h(k) = k % m
.
The key can be written in terms of a quotient q
, and remainder r
.
k = nq + r
Choosing the quotient to be q = m
allows us to write k % m
simply as the remainder in the above equation:
k % m = r = k - nm, where r < m
Therefore, k % m
is equivalent to continuously subtracting m
a total of n
times (until r < m
):
k % m = k - m - m - ... - m, until r < m
Lets try hashing the key k = 91
with m = 24 = 16
.
91 = 0101 1011
- 16 = 0001 0000
----------------
75 = 0100 1011
- 16 = 0001 0000
----------------
59 = 0011 1011
- 16 = 0001 0000
----------------
43 = 0010 1011
- 16 = 0001 0000
----------------
27 = 0001 1011
- 16 = 0001 0000
----------------
11 = 0000 1011
Thus, 91 % 24 = 11
is just the binary form of 91
with only the p=4
lowest bits remaining.
Important Distinction:
This pertains specifically to the division method of hashing. In fact, the converse is true for the multiplication method as stated in CLRS:
"An advantage of the multiplication method is that the value of m is not critical... We typically choose [m] to be a power of 2 since we can then easily implement the function on most computers."