Fast hash function with collision possibility near SHA-1

前端 未结 7 1874
情书的邮戳
情书的邮戳 2021-02-20 06:52

I\'m using SHA-1 to detect duplicates in a program handling files. It is not required to be cryptographic strong and may be reversible. I found this list of fast hash functions

7条回答
  •  陌清茗
    陌清茗 (楼主)
    2021-02-20 07:40

    Since the only relevant property of hash algorithms in your case is the collision probability, you should estimate it and choose the fastest algorithm which fulfills your requirements.

    If we suppose your algorithm has absolute uniformity, the probability of a hash collision among n files using hashes with d possible values will be:

    enter image description here

    For example, if you need a collision probability lower than one in a million among one million of files, you will need to have more than 5*10^17 distinct hash values, which means your hashes need to have at least 59 bits. Let's round to 64 to account for possibly bad uniformity.

    So I'd say any decent 64-bit hash should be sufficient for you. Longer hashes will further reduce collision probability, at a price of heavier computation and increased hash storage volume. Shorter caches like CRC32 will require you to write some explicit collision handling code.

提交回复
热议问题