Fast hash function with collision possibility near SHA-1

前端 未结 7 1901
情书的邮戳
情书的邮戳 2021-02-20 06:52

I\'m using SHA-1 to detect duplicates in a program handling files. It is not required to be cryptographic strong and may be reversible. I found this list of fast hash functions

7条回答
  •  北海茫月
    2021-02-20 07:26

    The facts:

    1. Good hash functions, specially the cryptographic ones (like SHA-1), require considerable CPU time because they have to honor a number of properties that wont be very useful for you in this case;
    2. Any hash function will give you only one certainty: if the hash values of two files are different, the files are surely different. If, however, their hash values are equal, chances are that the files are also equal, but the only way to tell for sure if this "equality" is not just a hash collision, is to fall back to a binary comparison of the two files.

    The conclusion:
    In your case I would try a much faster algorithm like CRC32, that has pretty much all the properties you need, and would be capable of handling more than 99.9% of the cases and only resorting to a slower comparison method (like binary comparison) to rule out the false positives. Being a lot faster in the great majority of comparisons would probably compensate for not having an "awesome" uniformity (possibly generating a few more collisions).

提交回复
热议问题