Efficiently picking a random element from a chained hash table?

问题

Just for practice (and not as a homework assignment) I have been trying to solve this problem (CLRS, 3rd edition, exercise 11.2-6):

Suppose we have stored n keys in a hash table of size m, with collisions resolved by chaining, and that we know the length of each chain, including the length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L * (1 + m/n)).

What I thought so far is that the probability of each key being returned is 1/n. If we try to get a random value x between 1 to n, and try to find the xth key in sequence first sorted by bucket then along the chain in the bucket, then it will take O(m) to find the right bucket by going through buckets one by one and O(L) time to get the right key in chain.

回答1:

Repeat the following steps until an element is returned:

Randomly select a bucket. Let k be the number of elements in the bucket.
Select p uniformly at random from 1 ... L. If p <= k then return the pth element in the bucket.

It should be clear that the procedure returns an element uniformly at random. We are sort of applying rejection sampling to the elements placed in a 2D array.

The expected number of elements per bucket is n / m. The probability that the sampling attempt succeeds is (n / m) / L. The expected number of attempts needed to find a bucket is therefore L * m / n. Together with the O(L) cost of retrieving the element from the bucket, the expected running time is O(L * (1 + m / n)).

来源：https://stackoverflow.com/questions/8629447/efficiently-picking-a-random-element-from-a-chained-hash-table

标签

algorithm

data-structures

random

hashtable

sampling