Uniquely identifying URLs with one 64-bit number

前端 未结 5 1054
无人共我
无人共我 2021-02-09 05:12

This is basically a math problem, but very programing related: if I have 1 billion strings containing URLs, and I take the first 64 bits of the MD5 hash of each of them, what ki

5条回答
  •  情书的邮戳
    2021-02-09 05:38

    Just by using a hash, there is always a chance of collisions. And you don't know beforehand wether collisions will happen once or twice, or even hundreds or thousands of times in your list of urls.

    The probability is still just a probability. Its like throwing a dice 10 or 100 times, what are the chances of getting all sixes? The probability says it is low, but it still can happen. Maybe even many times in a row...

    So while the birthday paradox shows you how to calculate the probabilities, you still need to decide if collisions are acceptable or not.

    ...and collisions are acceptable, and hashes are still the right way to go; find a 64 bit hashing algorithm instead of relying on "half-a-MD5" having a good distribution. (Though it probably has...)

提交回复
热议问题